100+ datasets found
  1. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  2. p

    Dataframe of Significant Stems.csv

    • psycharchives.org
    Updated Oct 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Dataframe of Significant Stems.csv [Dataset]. https://www.psycharchives.org/en/item/84d5c4b2-579d-48a0-8d4e-f02f2ae99192
    Explore at:
    Dataset updated
    Oct 8, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:

  3. Shopping Mall

    • kaggle.com
    zip
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
    Explore at:
    zip(22852 bytes)Available download formats
    Dataset updated
    Dec 15, 2023
    Authors
    Anshul Pachauri
    Description

    Libraries Import:

    Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

    Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

    Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

    Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

    Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

    Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

    Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

    Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

    Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").

  4. AI4Code Train Dataframe

    • kaggle.com
    zip
    Updated May 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2022). AI4Code Train Dataframe [Dataset]. https://www.kaggle.com/datasets/dschettler8845/ai4code-train-dataframe
    Explore at:
    zip(622120487 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Darien Schettler
    Description

    [EDIT/UPDATE]

    There are a few important updates.

    1. When SAVING the pd.Dataframe as a .csv, the following command should be used to avoid improper interpretation of newline character(s).
    train_df.to_csv(
      "train.csv", index=False, 
      encoding='utf-8', 
      quoting=csv.QUOTE_NONNUMERIC  # <== THIS IS REQUIRED
    )
    
    1. When LOADING the .csv as a pd.Dataframe, the following command must be used to avoid misinterpretation of NaN like strings (null, nan, ...) as pd.NaN values.
    train_df = pd.read_csv(
      "/kaggle/input/ai4code-train-dataframe/train.csv", 
      keep_default_na=False  # <== THIS IS REQUIRED
    )
    
  5. TF 2.0 QA - Simplified - DataFrame

    • kaggle.com
    zip
    Updated Nov 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Kovalets (2019). TF 2.0 QA - Simplified - DataFrame [Dataset]. https://www.kaggle.com/feanorpk/tf-20-qa-simplified-dataframe
    Explore at:
    zip(262240620 bytes)Available download formats
    Dataset updated
    Nov 12, 2019
    Authors
    Pavel Kovalets
    Description

    Content

    This dataset was created from the TensorFlow 2.0 Question Answering primary dataset using this very handy utility script. The main differences from the original one are: - the structure is flattened to a simple DataFrame - long_answer_candidates were removed - only first annotations kept for both long and short answer (for short answer it is a reasonable approximation because there are very few samples with multiple short answers)

    Acknowledgements

    Thanks xhlulu for providing the utility script.

  6. HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png, tiff
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily CP Weiss; Emily CP Weiss; Rachel J Dutton; Rachel J Dutton (2023). HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities [Dataset]. http://doi.org/10.5281/zenodo.7613703
    Explore at:
    csv, png, tiff, binAvailable download formats
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emily CP Weiss; Emily CP Weiss; Rachel J Dutton; Rachel J Dutton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is associated with this HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities pub from Arcadia Science.

    HiPR-FISH spatial imaging was used to look at the distribution of microbes within five distinct microbial communities growing on the surface of aged cheeses. Probe design and imaging was performed by Kanvas Biosciences.

    This dataset includes the following:

    • For each field of view (roughly 135µm x 135µm; 7 FOVs per each cheese specimen):
      • A fluorescence intensity image (*_spectral_max_projection.png/.tif).
      • A pseudo-colored microbe-labeled image (*_identification.png/.tif).
      • A data frame contains each identified microbe's identity, position, and size (*_cell_information.csv).
      • A segmented mask for microbiota (*_segmentation.png/.tif)
      • A spatial proximity graph for each species close to each other, showing the spatial enrichment over random distribution (*_spatialheatmap.png).
      • A corresponding data frame used to generate the spatial proximity graph (*_absolute_spatial_association.csv) and dataframe for the average of 500 random shuffles of the taxa (*_randomized_spatial_association_matrix.csv).
    • For each cheese specimen:
      • A widefield image with FOVs located on the image (*_WF_overlay.png).
    • In general:
      • A png showing the color legend for each species. (ARC1_taxa_color_legend.png)
      • A data frame showing the environmental location of each FOV in the cheese (RIND/CURD) and the location of each FOV relative to FOV 1. (ARC1_Cheese_Map.csv).
      • A vignette showing an example of each cell and its false coloring according to its taxonomic identification (ARC1_detected_species_representative_cell_vignette.png).
      • Sequences used as input in probe design (16S_18S_forKanvas.fasta).
      • A CSV file containing the sequences that belong to each ASV (ARC1_sequences_to_ASVs.csv).
      • Plots of log-transformed counts for each microbe detected across all FOVs, and broken down for each cheese (*detected_species_absolute_abundance.png).
      • CSVs containing pairwise correlation of FOVs based on spatial association (ARC1_spatial_association_FOV_correlation.csv) and microbial abundance (ARC1_abundance_FOV_correlation.csv).
      • Plots of spatial association matrices, aggregated for different cheeses and different locations (RIND vs CURD) (*samples_*loc_relative_spatial_association.png).
      • CSV containing the principle component coordinates for each FOV (ARC1_abundance_FOV_PCA.csv, ARC1_spatial_association_FOV_PCA.csv).
      • CSV containing the mean fold-change in number of edges between each ASV and the corresponding p-value when compared to the null state (random spatial association matrices) (ARC1_spatial_enrichment_significance.csv).
  7. Z

    Longitudinal corpus of privacy policies

    • data.niaid.nih.gov
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
    Explore at:
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    University of Basel
    Authors
    Wagner, Isabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

    policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

    policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

    labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

    Details on the methodology can be found in the accompanying paper.

  8. h

    descriptor_prediction

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, descriptor_prediction [Dataset]. https://huggingface.co/datasets/yhqu/descriptor_prediction
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Descriptor Prediction Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    descriptor_prediction.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/descriptor_prediction")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")

      Citation
    

    Please cite this work if you use… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.

  9. Exploring the Relationship between Lipid Profile Changes, Growth and...

    • zenodo.org
    Updated Nov 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saúl Fernandes; Saúl Fernandes; Diana Ilyaskina; Diana Ilyaskina (2023). Exploring the Relationship between Lipid Profile Changes, Growth and Reproduction in Folsomia candida Exposed to Teflubenzuron Over Time [Dataset]. http://doi.org/10.5281/zenodo.10069317
    Explore at:
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Saúl Fernandes; Saúl Fernandes; Diana Ilyaskina; Diana Ilyaskina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This submission provides csv files with the data files from a comprehensive study aimed at investigating the effects of sublethal concentrations of the insecticide teflubenzuron on the survival, growth, reproduction, and lipid changes of theCollembola Folsomia candida over different exposure periods.

    The dataset files are provided in CSV format with Comma Separated Values:

    1. Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv
    2. Main_Lipid_Class_Log10_RawData.csv
    3. Lipid_Categories_RawData.csv
    4. Bioaccumulation_SoilQuantification_RawData.csv

    Description of the files

    1. The csv Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv containes the dataframe in vertical format with the data for suvival of Folsomia candida, changes in biomass and reproduction.
    2. The csv file Main_Lipid_Class_Log10_RawData provides the dataframe in horizontal format with the log10 transformed total lipid content of the main lipid classes in Folsomia candida. Data used to produce Figure 5 of the manuscript. Full name of lipids abbreviation are provided in supplementary information of the manuscript.
    3. The csv file Lipid_Categories_RawData provides the dataframe with lipid categories in Folsomia candida. Data was not used in the data analysis described in the manuscript.
    4. The csv file Bioaccumulation_SoilQuantification_RawData.csvontaines the dataframe in vertical format with the data for soil quantification of the insecticide in the soil, in the animals and the calculation for bioaccumulation factor.

    Variables in the files:

    File 1:

    sample: sample unique ID

    • days: day of sampling
    • dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • solvent: solvent used in the soil (acetone or water)
    • age: age of the animals Folsomia candida at the day of sampling
    • survival: number of surviving adults of Folsomia candida
    • total.biomass(mg): total biomass in mg of the pool of animals in each sample
    • biomass.individual(ng): "total.biomass(mg)" devided by the "survival" and converted to ng.
    • offspring: number of offspring produced in each sample (NA for samples where this number is not possible to acess)

    Files 2 and 3:

    • sample: sample unique ID
    • days: day of sampling
    • dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)

    File 4:

    • sample: sample unique ID
    • days: day of sampling
    • dose.nominal(ng/g): nominal dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • dose.measured(ng/g): measured dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • biomass.total.wet(g): "total.biomass(mg)" devided by the "survival" and converted to ng.
    • number.animals: umber of surviving adults of Folsomia candida in each sample
    • biomass.individual.dry(g): "total.biomass(mg)" devided by the "number.animals" and converted to ng.
    • measured.insecticide.animals(ng): measured amount of teflubenzuron (insecticide) in the pool of animals (mg a.s. kg soil -1)
    • accumulation.insecticide(ng/g of dry body weight): "measured.insecticide.animals(ng)" devided by "biomass.total.wet(g)"
    • baf: "accumulation.insecticide" devided by "dose.measured(ng/g)"

    [NA stands for samples lost/ not measured]

    This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 859891.

    This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains.

  10. H

    National Water Model RouteLinks CSV

    • beta.hydroshare.org
    • hydroshare.org
    • +2more
    zip
    Updated Oct 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason A Regina; Austin Raney (2021). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
    Explore at:
    zip(1.1 MB)Available download formats
    Dataset updated
    Oct 15, 2021
    Dataset provided by
    HydroShare
    Authors
    Jason A Regina; Austin Raney
    License

    https://mit-license.org/https://mit-license.org/

    Time period covered
    Apr 12, 2019 - Oct 14, 2021
    Area covered
    Description

    This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

    This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks

  11. Restaurant Dish Orders in Power BI

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fords (2024). Restaurant Dish Orders in Power BI [Dataset]. https://www.kaggle.com/datasets/fords001/restaurant-dish-orders
    Explore at:
    zip(620177 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Fords
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In this data analysis, I used the dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground .Which has a license (License: Public Domain). Public domain license work is free for use by anyone for any purpose without restriction under copyright law. Public domain is the form of open/free, since no one owns or controls the material in any way. Dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground has 3 dataframes in csv format: ‘restaurant_db_data_dictionary.csv’ as an instruction or description of the relationships between tables. ‘order_details.csv’ - it has columns order_details_id,order_id, order_date, order_time,item_id ‘menu_items.csv‘ - it has columns menu_item_id , item_name ,category ,price .

    Using 3 dataframes we will create new dataframe ‘order_details_table' (result dataframe in Power BI file restaurant_orders_result.pbix). Based on this new dataframe, we will generate various charts visualizations in the file restaurant_orders_result_charts.pbix and also attach the charts here .Below is a more detailed description of how I created the new dataframe 'order_details_table' ,and the visualizations, including bar charts and pie charts.

    I will use Power Bi in this project . 1. Delete all rows where value rows is ‘NULL’ in the column ‘item_id’ from the dataframe ‘order_details’. For this, I use Power Query Editor and the ‘Keep Rows’ function. And keep all rows except for 'NULL' values . 2. Combine 2 columns ‘order_date’ and ‘order_time’ to 1 column ‘order_date_time’ in the format MM/DD/YY HH:MM:SS 3. We also need to merge two dataframes into one dataframe ‘order_details_table’ using the ‘Merge Queries’ function in Power Query Editor and choose inner join (only matching rows). In the dataframe ‘restaurant_db_data_dictionary.csv’ we find information that column ‘item_id’ from ‘order_details’ table matches the ‘menu_item_id’ in the ‘menu_items’ table and combine 2 tables by common column id ‘menu_item_id’ and ‘item_id’ . 4. We remove the columns that we don’t need and also create a new ‘order_id’ with unique number for each order.

    As a result we have 6 columns in the new dataframe ‘order_details_table’ , such as: order_details_id: A unique identifier for each dish within an order, order_id : The unique identifier for each order or transaction , order_date_time : The date when the order was created in the format (MM/DD/YY HH:MM:SS) , menu_item_category : The category to which the dish belongs , menu_item_name : The name of the dish on the menu , menu_item_price : The price of the dish .

    Table order_detail_tables from Power BI file restaurant_orders_result.pbix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F1098315c0e34255b67ad3419aa113bf0%2Fdataframe.png?generation=1730269164808705&alt=media" alt="">

    I have also created bar charts and pie charts to display the results from the new dataframe. These plots are included in the file ‘restaurant_orders_result_charts.pbix’ . And you can find pictures of charts below.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F4254696bbd3d7e0fc5f456c226c39114%2Fpicture_1.png?generation=1730269227195114&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F71092cf769862cf7364fe1ccac9fad83%2Fpicture_2.png?generation=1730269249147687&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F528ef51ecf21f006b0c21b65503e03fa%2Fpicture_3.png?generation=1730269284640753&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F147c240da4be5bfe9da057a8bc5d5939%2Fpicture_4.png?generation=1730269300346146&alt=media" alt="">

    I also attached the original and new files to this project, thank you.

  12. r

    Myrstener et al. (2025) Downstream temperature effects of boreal forest...

    • researchdata.se
    • su.figshare.com
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caroline Greiser; Lenka Kuglerová; Maria Myrstener (2025). Myrstener et al. (2025) Downstream temperature effects of boreal forest clearcutting vary with riparian buffer width - Data and Code [Dataset]. http://doi.org/10.17045/STHLMUNI.27188004
    Explore at:
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    Stockholm University
    Authors
    Caroline Greiser; Lenka Kuglerová; Maria Myrstener
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please read the readme.txt !

    This depository contains raw and clean data (.csv), as well as the R-scripts (.r) that process the data, create the plots and the models.

    We recommend to go through the R-scripts in their chronological order.

    Code was developed in the R software:

    R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64

    ****** List of files ********************************

    • Data

    ---raw

    72 files from 72 Hobo data loggers

    names: site_position_medium.csv

    example: "20_20_down_water.csv" (site = 20, position = 20 m downstream, medium = water)

    ---clean

    site_logger_position_medium.csv list of all sites, their loggers, their position and medium in which they were placed

    loggerdata_compiled.csv all raw logger data (see above) compiled into one dataframe, for column names see below

    Daily_loggerdata.csv all data aggregated to daily mean, max and min values, for column names see below

    CG_site_distance_pairs.csv all logger positions for each stream and their pairwise geographical distance in meters

    Discharge_site7.csv Discharge data for the same season as logger data from a reference stream

    buffer_width_eniro_CG.csv measured and averaged buffer widths for each of the studied streams (in m)

    • Scripts

    01_compile_clean_loggerdata.r

    02_aggregate_loggerdata.r

    03_model_stream_temp_summer.r

    03b_model_stream_temp_autumn.r

    04_calculate_warming_cooling_rates_summer.r

    04b_calculate_warming_cooling_rates_autumn.r

    05_model_air_temp_summer.r

    05b_model_air_temp_autumn.r

    06_plot_representative_time_series_temp_discharge.r

    ****** Column names ********************************

    Most column names are self explaining, and are also explained in the R code.

    Below some detailed info on two dataframes (.csv) - the column names are similar in other csv files

    File "loggerdata_compiled.csv" [in Data/clean/ ]

    "Logger.SN" Logger serial number

    "Timestamp" Datetime, YYYY-MM-DD HH:MM:SS

    "Temp" temperature in °C

    "Illum" light in lux

    "Year" YYYY

    "Month" MM

    "Day" DD

    "Hour" HH

    "Minute" MM

    "Second" SS

    "tz" time zone

    "path" file path

    "site" stream/site ID

    "file" file name

    "medium" "water" or "air"

    "position" one of 6 positions along the stream: up, mid, end, 20, 70, 150

    "date" YYYY-MM-DD

    File "Daily_loggerdata.csv" [in Data/clean/ ]

    "date" ... (see above)

    "Logger.SN" Logger serial number

    "mean_temp" mean daily temperature

    "min_temp" minimum daily temperature

    "max_temp" maximum daily temperature

    "path" ...

    "site" ...

    "file" ...

    "medium" ...

    "position" ...

    "buffer" one of 3 buffer categories: no, thin, wide

    "Temp.max.ref" maximum daily temperature of the upstream reference logger

    "Temp.min.ref" minimum daily temperature of the upstream reference logger

    "Temp.mean.ref" mean daily temperature of the upstream reference logger

    "Temp.max.dev" max. temperature difference to upstream reference

    "Temp.min.dev" min. temperature difference to upstream reference

    "Temp.mean.dev" mean temperature difference to upstream reference

    Paper abstract:

    Clearcutting increases temperatures of forest streams, and in temperate zones, the effects can extend far downstream. Here, we studied whether similar patterns are found in colder, boreal zones and if riparian buffers can prevent stream water from heating up. We recorded temperature at 45 locations across nine streams with varying buffer widths. In these streams, we compared upstream (control) reaches with reaches in clearcuts and up to 150 m downstream. In summer, we found daily maximum water temperature increases on clearcuts up to 4.1 °C with the warmest week ranging from 12.0 to 18.6 °C. We further found that warming was sustained downstream of clearcuts to 150 m in three out of six streams with buffers < 10 m. Surprisingly, temperature patterns in autumn resembled those in summer, yet with lower absolute temperatures (maximum warming was 1.9 °C in autumn). Clearcuts in boreal forests can indeed warm streams, and because these temperature effects are propagated downstream, we risk catchment-scale effects and cumulative warming when streams pass through several clearcuts. In this study, riparian buffers wider than 15 m protected against water temperature increases; hence, we call for a general increase of riparian buffer width along small streams in boreal forests.

  13. HPA - Processed Train Dataframe With Cell-Wise RLE

    • kaggle.com
    zip
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2021). HPA - Processed Train Dataframe With Cell-Wise RLE [Dataset]. https://www.kaggle.com/dschettler8845/hpa-processed-train-dataframe-with-cellwise-rle
    Explore at:
    zip(1111131078 bytes)Available download formats
    Dataset updated
    Feb 9, 2021
    Authors
    Darien Schettler
    Description

    Description

    This is a CSV file after some minor preprocessing (one-hot-expansion, etc.) that also includes all the RLEs and Bounding Boxes as a list for each respective ID.

    The individual RLEs in the list will correspond to a cell in the given image. 
The individual Bounding Boxes in the list will correspond to a cell in the given image.

    The RLE and Bounding Box are ordered to refer to the same respective cell.

  14. h

    property_based_matching

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, property_based_matching [Dataset]. https://huggingface.co/datasets/yhqu/property_based_matching
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Property Based Matching Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    property_based_matching.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/property_based_matching")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/property_based_matching/property_based_matching.csv")

      Citation
    

    Please cite this work if… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/property_based_matching.

  15. Z

    GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    University of New Caledonia
    Authors
    Wattelez, Guillaume
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of .bin file that have an IndexError when processing.

    Consider #120 OxWearables / stepcount issue for more details.

    The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

    reading the .bin with the GENEAread R package.

    keeping only the time, x, y and z columns.

    saving the data.frame into a .csv file.

    The only difference between the .csv files is the column format used for the time column before saving:

    time column in XXXXXX_....csv had a string class

    time column in XXXXXT....csv had a "POSIXct" "POSIXt" class

  16. h

    oldIT2modIT

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Romano, oldIT2modIT [Dataset]. https://huggingface.co/datasets/cybernetic-m/oldIT2modIT
    Explore at:
    Authors
    Massimo Romano
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Download the dataset

    At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")

    You can visualize the dataset with: df.head()

    To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)

      Dataset Description
    

    This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.

  17. m

    Data for: Can government transfers make energy subsidy reform socially...

    • data.mendeley.com
    Updated Mar 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filip Schaffitzel (2020). Data for: Can government transfers make energy subsidy reform socially acceptable? A case study on Ecuador [Dataset]. http://doi.org/10.17632/z35m76mf9g.1
    Explore at:
    Dataset updated
    Mar 31, 2020
    Authors
    Filip Schaffitzel
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    Ecuador
    Description

    Estimating the distributional impacts of energy subsidy removal and compensation schemes in Ecuador based on input-output and household data.

    Import files: Dictionary Categories.csv, Dictionary ENI-IOT.csv, and Dictionary Subcategories.csv based on [1] Dictionary IOT.csv and IOT_2012.csv (cannot be redistruted) based on [2] Dictionary Taxes.csv and Dictionary Transfers.csv based on [3] ENIGHUR11_GASTOS_V.csv, ENIGHUR11_HOGARES_AGREGADOS.csv, and ENIGHUR11_PERSONAS_INGRESOS.csv based on [4] Price increase scenarios.csv based on [5]

    Further basic files and documents: [1] 4_M&D_Mapping ENIGHUR expenditures to IOT_180605.xlsm [2] Input-output table 2012 (https://contenido.bce.fin.ec/documentos/PublicacionesNotas/Catalogo/CuentasNacionales/Anuales/Dolares/MIP2012Ampliada.xls). Save the sheet with the IOT 2012 (Matriz simétrica) as IOT_2012.csv and edit the format: first column and row: IOT labels [3] 4_M&D_ENIGHUR income_180606.xlsx [4] ENIGHUR data can be retrieved from http://www.ecuadorencifras.gob.ec/encuesta-nacional-de-ingresos-y-gastos-de-los-hogares-urbanos-y-rurales/ Household datasets are only available in SPSS file format and the free software PSPP is used to convert .sav- to .csv-files, as this format can be read directly and efficiently into a Python Pandas DataFrame. See PSPP syntax below: save translate /outfile = filename /type = CSV /textoptions decimal = DOT /textoptions delimiter = ';' /fieldnames /cells=values /replace. [5] 3_Ecuador_Energy subsidies and 4_M&D_Price scenarios_180610.xlsx

  18. Drosophila central brain connectivity

    • figshare.com
    application/x-gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Turner (2023). Drosophila central brain connectivity [Dataset]. http://doi.org/10.6084/m9.figshare.13349282.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Max Turner
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    See https://github.com/mhturner/SC-FC for analysis code and figure generation using this dataset-StructuralMatrix_branson.csv and CorrelationMatrix_Branson.csv are the fine-grained Branson segmentation based structural and functional connectivity matrices shown in Fig. S3.-body_ids.csv is the list of unique body ID numbers in the Hemibrain connectome that were used to compute structural connectivity.data_TurnerMannClandinin:Compressed directory containing these subdirectories (in bold):-atlas_data contains original Ito and Branson brain atlas/segmentation files-ito_68_atlas & branson_999_atlas each contain .nii.gz image files containing the registered brain atlas mask for each fly. The mask numbers correspond to the regions as in Original_Index_panda_full.csv and atlas_roi_values-ito_responses & branson_responses each contain pandas dataframes, one for each fly, describing raw fluorescence responses for each brain region in that atlas. The sampling rate is 1.2 Hz.-connectome_connectivity contains computed pandas dataframe files for various structural connectivity metrics, for the regions included in the main paper analysis-hemi_2_atlas contains results of structural connectivity computations from R code included in the corresponding GitHub repository.-subsample contains numpy data the result of the region subsampling analysis-template_brains contains template brain files in the original (JFRC2) brain template space as well as the same atlases transformed to JRC2018 space (in both .tif and .nii.gz format)registration.xform.xipTransform file (CMTK format) to convert from JFRC2 template brain space, where the Ito atlas and Branson segmentation are, to JRC2018 template brain space

  19. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  20. m

    Philaenus spumarius and other meadow spittlebugs in Trentino. Italy

    • data.mendeley.com
    Updated Sep 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabina Avosani (2021). Philaenus spumarius and other meadow spittlebugs in Trentino. Italy [Dataset]. http://doi.org/10.17632/7rv4czkykr.1
    Explore at:
    Dataset updated
    Sep 20, 2021
    Authors
    Sabina Avosani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy
    Description

    This dataset is associated with the article "Occupancy and detection of agricultural threats: the case of Philaenus spumarius, European vector of Xylella fastidiosa" by the same authors published in JOURNAL 2021 . The data about Philaenus spumarius and other co-occurring species were collected in Trentino, Italy, during the spring and summer 2018 in olive orchards and vineyards. Here are provided the raw data, some preprocessed data and the R codes that we used for the analysis presented in the publication. Please refer to the above mentioned article for more details.

    List of files:

    samplings.xlsx original dataset of field sampling (Sheet: survey), site coordinates and info (sheet: info site) and metadata (sheet: legenda) counts_per_site.csv occupancy abundance dataframe for p. spumarius philaenus_occupancy_data.csv occupancy presence dataframe for p. spumarius sites.cov.csv site covariates for occupancy model observation.cov.csv observation covariates for occupancy mode Rcode.zip commented code and data in R format to run occupancy models for P. Spumarius

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Organization logo

Merge number of excel file,convert into csv file

merging the file and converting the file

Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

  1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
  2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
  3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

  • Python
  • Pandas

Project Implementation:

  1. DataFrame Creation:

    • Import the Pandas library.
    • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
    • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
  2. Data Manipulation:

    • Add new columns to the DataFrame representing derived data or computations based on existing columns.
    • Filter the DataFrame to include only specific rows based on certain conditions.
    • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
  3. File Conversion:

    • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
    • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
    • Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

Search
Clear search
Close search
Google apps
Main menu