100+ datasets found
  1. Z

    Data pipeline Validation And Load Testing using Multiple CSV Files

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Research Fellow, University of Tartu
    Masters Student, University of Tartu
    Lecturer, University of Tartu
    Authors
    Mainak Adhikari; Afsana Khan; Pelle Jakovits
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

  2. Comparison of Data Import Methods: Speed, Flexibility, and Data Handling

    • blog.devart.com
    html
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devart (2025). Comparison of Data Import Methods: Speed, Flexibility, and Data Handling [Dataset]. https://blog.devart.com/import-data-quickbooks-online-odbc-driver.html
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Devart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Speed, Method, Flexibility, Data Handling Capabilities
    Description

    A comparison of common data import methods including ODBC, CSV uploads, QuickBooks Integration API, and third-party apps, focusing on speed, flexibility, and data handling capabilities.

  3. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  4. Adventure Works 2022 CSVs

    • kaggle.com
    zip
    Updated Nov 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
    Explore at:
    zip(567646 bytes)Available download formats
    Dataset updated
    Nov 2, 2022
    Authors
    Algorismus
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Adventure Works 2022 dataset

    How this Dataset is created?

    On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

    How this Dataset may help you?

    this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

    How to use this Dataset?

    Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.

  5. Data from: Data and code from: Environmental influences on drying rate of...

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-environmental-influences-on-drying-rate-of-spray-applied-disinfestants-
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files

  6. e

    Csv Marketing Export Import Data | Eximpedia

    • eximpedia.app
    Updated Oct 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Csv Marketing Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/csv-marketing/81787381
    Explore at:
    Dataset updated
    Oct 17, 2025
    Description

    Csv Marketing Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  7. converted json to CSV Traffy Fondue data

    • kaggle.com
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hansen (2025). converted json to CSV Traffy Fondue data [Dataset]. https://www.kaggle.com/datasets/motethansen/converted-json-to-csv-traffy-fondue-data
    Explore at:
    zip(31705770 bytes)Available download formats
    Dataset updated
    Jan 15, 2025
    Authors
    Hansen
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Traffy Fondue Data

    Data pulled from Traffy Fondue, by accessing the Traffy Fondue Open API. Date January 2022 until January 2025

    The following code pulled the data:

    
    import os
    import json
    import requests
    from datetime import datetime, timedelta
    import time
    
    class TraffyDataFetcher:
      def _init_(self, start_date, subfolder='traffyfonduedata'):
        self.url = "https://publicapi.traffy.in.th/share/teamchadchart/search"
        self.query = {'offset': '0'}
        self.payload = {}
        self.headers = {}
        self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
        self.end_date = datetime.now()
        self.subfolder = subfolder
        self.max_requests_per_minute = 99
    
        if not os.path.exists(self.subfolder):
          os.makedirs(self.subfolder)
    
      def add_days_to_date(self, start_date_str, days_to_add):
        start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
        new_date = start_date + timedelta(days=days_to_add)
        return new_date.strftime('%Y-%m-%d')
    
      def fetch_data(self):
        current_date = self.start_date
        index = 0
    
        while current_date <= self.end_date:
          start_time = datetime.now()
    
          self.query['start'] = current_date.strftime('%Y-%m-%d')
          new_date = self.add_days_to_date(self.query['start'], 10)
          self.query['end'] = new_date
          response = requests.request("GET", self.url, headers=self.headers, data=self.payload, params=self.query)
          print(f"offset: {index} response: {response.status_code}")
    
          filename = f"traffy_{current_date.strftime('%Y-%m-%d')}.json"
          file_path = os.path.join(self.subfolder, filename)
    
          with open(file_path, "w") as outfile:
            json_object = json.dumps(response.json(), indent=4)
            outfile.write(json_object)
    
          end_time = datetime.now()
          elapsed_time = (end_time - start_time).total_seconds()
          print(f"Elapsed time: {elapsed_time} s")
    
          index += 950
          current_date = datetime.strptime(new_date, '%Y-%m-%d') + timedelta(days=1)
    
          if index % self.max_requests_per_minute == 0:
            time.sleep(60 - elapsed_time)
    
    if _name_ == "_main_":
      fetcher = TraffyDataFetcher(start_date='2022-01-01')
      fetcher.fetch_data()
    

    --

    And the following code converted the json to CSV files

    import os
    import glob
    import json
    import pandas as pd
    #import numpy as np
    
    class TraffyJSONFixer:
      def _init_(self, path_to_json='*.json', subfolder='traffyfonduedata'):
        self.path_to_json = path_to_json
        self.subfolder = subfolder
        self.outputfolder = 'fixedjson'
        self.excelfolder = 'exceloutput'
        self.file_path = os.path.join(self.subfolder, self.path_to_json)
        self.json_files = glob.glob(self.file_path)
        
        # Ensure the subfolder exists
        if not os.path.exists(self.subfolder):
          os.makedirs(self.subfolder)
        # Ensure the outputfolder exists
        if not os.path.exists(self.outputfolder):
          os.makedirs(self.outputfolder)
        # Ensure the excelfolder exists
        if not os.path.exists(self.excelfolder):
          os.makedirs(self.excelfolder)
        
        # Debugging: Print the current working directory and the list of JSON files
        print(f"Current working directory: {os.getcwd()}")
        print(f"Found JSON files: {self.json_files}")
        
      def fix_json_files(self):
        for count, ele in enumerate(self.json_files):
          new_file_name = os.path.join(self.outputfolder, f"data_{os.path.basename(ele)}")
          
          try:
            with open(ele, 'r', encoding='utf-8') as f:
              data = json.load(f)
    
            # Debugging: Print the type of data
            print(f"Processing file: {ele}")
            print(f"Type of data: {type(data)}")
            
            # Handle different JSON structures
            if isinstance(data, dict) and "results" in data:
              results = data["results"]
            elif isinstance(data, list):
              results = data
            else:
              print(f"Unexpected JSON structure in file: {ele}")
              continue
    
            # Ensure results is a list or dict before writing
            if isinstance(results, (list, dict)):
              with open(new_file_name, 'w', encoding='utf-8') as f:
                f.write(json.dumps(results, indent=4))
            else:
              print(f"Unexpected type for results in file: {ele}")
          except (json.JSONDecodeError, KeyError) as e:
            print(f"Error processing file {ele}: {e}")
    
      def jsontoexcel(self):
        jsonfile_path = os.path.join(self.out...
    
  8. c

    Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format

    • crawlfeeds.com
    csv, zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format [Dataset]. https://crawlfeeds.com/datasets/dog-food-data-extracted-from-chewy-usa-4-500-records-in-csv-format
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.

    The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.

    Key Features:

    • Record Count: 4,500 dog food product records.
    • Data Fields: Product names, brands, prices, descriptions, ingredients .. etc. Find more fields under data points section.
    • Format: CSV, easy to import into databases and data analysis tools.
    • Source: Extracted from Chewy’s official USA platform.
    • Geography: Focused on the USA dog food market.

    Use Cases:

    • Market Research: Analyze trends and preferences in the USA dog food market, including popular brands, price ranges, and product availability.
    • E-commerce Analysis: Understand how Chewy presents and prices dog food products, helping businesses compare their own product offerings.
    • Competitor Analysis: Compare different brands and products to develop competitive strategies for dog food businesses.
    • Machine Learning Models: Use the dataset for machine learning tasks such as product recommendation systems, demand forecasting, and price optimization.

  9. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  10. CSV-data-load_data_metadata

    • kaggle.com
    zip
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DINESH JATAV (2023). CSV-data-load_data_metadata [Dataset]. https://www.kaggle.com/datasets/dineshjatav/csv-data-load-data-metadata
    Explore at:
    zip(23415375 bytes)Available download formats
    Dataset updated
    Jun 7, 2023
    Authors
    DINESH JATAV
    Description

    Dataset

    This dataset was created by DINESH JATAV

    Contents

  11. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  12. e

    Csv Investments Private Limited Export Import Data | Eximpedia

    • eximpedia.app
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Csv Investments Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/csv-investments-private-limited/54413494
    Explore at:
    Dataset updated
    Oct 8, 2025
    Description

    Csv Investments Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  13. Z

    Data from: Large Landing Trajectory Data Set for Go-Around Analysis

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    ZHAW
    Authors
    Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

    If you use this data for a scientific publication, please consider citing our paper.

    The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

    go_arounds_minimal.csv.gz

    Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    

    The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

    go_arounds_augmented.csv.gz

    Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    
    
        registration
        string
        Aircraft registration
    
    
        typecode
        string
        Aircraft ICAO typecode
    
    
        icaoaircrafttype
        string
        ICAO aircraft type
    
    
        wtc
        string
        ICAO wake turbulence category
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
    

    string

        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometre
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        operator_country
        string
        ISO Alpha-3 country code of the operator
    
    
        operator_region
        string
        Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        wind_speed_knts
        integer
        METAR, surface wind speed in knots
    
    
        wind_dir_deg
        integer
        METAR, surface wind direction in degrees
    
    
        wind_gust_knts
        integer
        METAR, surface wind gust speed in knots
    
    
        visibility_m
        float
        METAR, visibility in m
    
    
        temperature_deg
        integer
        METAR, temperature in degrees Celsius
    
    
        press_sea_level_p
        float
        METAR, sea level pressure in hPa
    
    
        press_p
        float
        METAR, QNH in hPA
    
    
        weather_intensity
        list
        METAR, list of present weather codes: qualifier - intensity
    
    
        weather_precipitation
        list
        METAR, list of present weather codes: weather phenomena - precipitation
    
    
        weather_desc
        list
        METAR, list of present weather codes: qualifier - descriptor
    
    
        weather_obscuration
        list
        METAR, list of present weather codes: weather phenomena - obscuration
    
    
        weather_other
        list
        METAR, list of present weather codes: weather phenomena - other
    

    This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

    go_arounds_agg.csv.gz

    Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        n_landings
        integer
        Total number of landings observed on this runway in 2019
    
    
        ga_rate
        float
        Go-around rate, per 1000 landings
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
        string
        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometres
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    

    This aggregated data set is used in the paper for the generalized linear regression model.

    Downloading the trajectories

    Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

    import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

    load minimum data set

    df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

    select London City Airport, go-arounds, and 2019-01-04

    airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

    df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

    iterate over flights and pull the data from OpenSky Network

    flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

    # fetch the data from OpenSky Network
    flights.append(
      opensky.history(
        start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
        stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
        callsign=row["callsign"],
        return_flight=True,
      )
    )
    

    The flights can be converted into a Traffic object

    Traffic.from_flights(flights)

    Additional files

    Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

    validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

    validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

  14. Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    • The data is licensed through the Creative Commons Attribution 4.0 International.
    • If you have used our data and are publishing your work, we ask that you please reference both:
      1. this database through its DOI, and
      2. any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    • Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
    • Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
    • Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
      • Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
      • We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Clean_Data_v1-0-0.zip: contains all the downsampled data
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Database_References_v1-0-0.bib
      • Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    • The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
    • Time[s]: time in seconds since the start of the test
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas
    data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    • The first column is the index of each data point
    • S/No: sample number recorded by the DAQ
    • System Date: Date and time of sample
    • Time[s]: time in seconds since the start of the test
    • C_1_Force[kN]: load cell force
    • C_1_Déform1[mm]: extensometer displacement
    • C_1_Déplacement[mm]: cross-head displacement
    • Eng_Stress[MPa]: engineering stress
    • Eng_Strain[]: engineering strain
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    • hidden_index: internal reference ID
    • grade: material grade
    • spec: specifications for the material
    • source: base material for the test specimen
    • id: internal name for the specimen
    • lp: load protocol
    • size: type of specimen (M8, M12, M20)
    • gage_length_mm_: unreduced section length in mm
    • avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
    • avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
    • avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
    • fy_n_mpa_: nominal yield stress
    • fu_n_mpa_: nominal ultimate stress
    • t_a_deg_c_: ambient temperature in degC
    • date: date of test
    • investigator: person(s) who conducted the test
    • location: laboratory where test was conducted
    • machine: setup used to conduct test
    • pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
    • pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
    • pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
    • citekey: reference corresponding to the Database_References.bib file
    • yield_stress_mpa_: computed yield stress in MPa
    • elastic_modulus_mpa_: computed elastic modulus in MPa
    • fracture_strain: computed average true strain across the fracture surface
    • c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
    • file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
              index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
              keep_default_na=False, na_values='')
    • citekey: reference in "Campaign_References.bib".
    • Grade: material grade.
    • Spec.: specifications (e.g., J2+N).
    • Yield Stress [MPa]: initial yield stress in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
    • Elastic Modulus [MPa]: initial elastic modulus in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    • The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
      • A500
      • A992_Gr50
      • BCP325
      • BCR295
      • HYP400
      • S460NL
      • S690QL/25mm
      • S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
  15. UCI Diabetes Data Set

    • kaggle.com
    zip
    Updated Oct 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krittaphas Chaisutyakorn (2021). UCI Diabetes Data Set [Dataset]. https://www.kaggle.com/kritchais/uci-diabetes-data-set
    Explore at:
    zip(189079 bytes)Available download formats
    Dataset updated
    Oct 7, 2021
    Authors
    Krittaphas Chaisutyakorn
    Description

    UCI Diabetes Data Set

    Diabetes patient data from UCI Machine Learning Repository

    Data Set Information:

    Diabetes patient records were obtained from two sources: an automatic electronic recording device and paper records. The automatic device had an internal clock to timestamp events, whereas the paper records only provided "logical time" slots (breakfast, lunch, dinner, bedtime). For paper records, fixed times were assigned to breakfast (08:00), lunch (12:00), dinner (18:00), and bedtime (22:00). Thus paper records have fictitious uniform recording times whereas electronic records have more realistic time stamps.

    Diabetes files consist of four fields per record. Each field is separated by a tab and each record is separated by a newline.

    File Names and format: (1) Date in MM-DD-YYYY format (2) Time in XX:YY format (3) Code (4) Value

    The Code field is deciphered as follows:

    33 = Regular insulin dose 34 = NPH insulin dose 35 = UltraLente insulin dose 48 = Unspecified blood glucose measurement 57 = Unspecified blood glucose measurement 58 = Pre-breakfast blood glucose measurement 59 = Post-breakfast blood glucose measurement 60 = Pre-lunch blood glucose measurement 61 = Post-lunch blood glucose measurement 62 = Pre-supper blood glucose measurement 63 = Post-supper blood glucose measurement 64 = Pre-snack blood glucose measurement 65 = Hypoglycemic symptoms 66 = Typical meal ingestion 67 = More-than-usual meal ingestion 68 = Less-than-usual meal ingestion 69 = Typical exercise activity 70 = More-than-usual exercise activity 71 = Less-than-usual exercise activity 72 = Unspecified special event

    The data was extracted via this code and then was save to new csv file.

    import pandas as pd
    
    def convert(file):
      df = pd.DataFrame(columns=['date', 'time', 'code', 'value'])
    
      with open(str(file), 'r') as data:
        for line in data.readlines():
          line = line.replace('
    

    ', '') line_split = line.split('\t')

          line_df = pd.DataFrame([line_split], columns=['date', 'time', 'code', 'value'])
          df = df.append(line_df)
          df.index = range(0, len(df))
        data.close()
    
      new_file = file.parent / f'{file.name}_csv.csv'
    
      with open(new_file, 'w') as terminal_file:
    
        df.to_csv(terminal_file)
    
        terminal_file.close()
        print(f"{file.name} was saved.")
    
    path = 'C:/Users/Krittaphas/PycharmProjects/auto/Diabetes-Data'
    
    for file in Path(path).iterdir():
      convert(file)`
    
    
    import pandas as pd
    from pathlib import Path
    
    path = 'C:/Users/Krittaphas/PycharmProjects/auto/Diabetes-Data'
    
    main_df = pd.DataFrame(columns=['patient_id', 'date', 'time', 'code', 'value'])
    row_list = []
    
    for file in Path(path).iterdir():
    
      number = file.name[5:7]
    
      df = pd.read_csv(file)
      df['patient_id'] = number
    
      main_df = main_df.append(df)
    
    main_df.drop('Unnamed: 0', inplace=True, axis=1)
    print(main_df)
    
    main_df.to_csv(path+'/diabetes_data_all_patient.csv')
    print('Complete')
    
  16. Landmarks Dataset for sign recognition numbers

    • kaggle.com
    zip
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
    Explore at:
    zip(50385 bytes)Available download formats
    Dataset updated
    Nov 4, 2022
    Authors
    Akshat Mittu
    Description

    This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

    You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

    import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

    for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

        for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
    
          mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks,
                   connections = mp_hands.HAND_CONNECTIONS)
      a = dict()
      a['label'] = t
      for i in range(21):
        s = ('x','y','z')
        k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z)
        for j in range(len(k)):
          a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j]
      df = df.append(a,ignore_index=True)
    
  17. LeetCode CN Problems

    • kaggle.com
    zip
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    imba-tjd (2024). LeetCode CN Problems [Dataset]. https://www.kaggle.com/datasets/imbatjd/leetcode-cn-problems/code
    Explore at:
    zip(530963 bytes)Available download formats
    Dataset updated
    Apr 5, 2024
    Authors
    imba-tjd
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The data was collected on 2024-04-05 containing 3492 problems.

    Cleaned via the following script.

    import json
    import csv
    from io import TextIOWrapper
    
    
    def clean(data: dict):
      questions = data['data']['problemsetQuestionList']['questions']
      for q in questions:
        yield {
          'id': q['frontendQuestionId'],
          'difficulty': q['difficulty'],
          'title': q['title'],
          'titleCn': q['titleCn'],
          'titleSlug': q['titleSlug'],
          'paidOnly': q['paidOnly'],
          'acRate': round(q['acRate'], 3),
          'topicTags': [t['name'] for t in q['topicTags']],
        }
    
    
    def out_jsonl(f: TextIOWrapper):
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        for q in clean(data):
          f.write(json.dumps(q, ensure_ascii=False))
          f.write('
    ')
    
    
    def out_json(f: TextIOWrapper):
      l = []
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        for q in clean(data):
          l.append(q)
    
      json.dump(l, f, ensure_ascii=False)
    
    
    def out_csv(f: TextIOWrapper):
      writer = csv.DictWriter(f, fieldnames=[
        'id', 'difficulty', 'title', 'titleCn', 'titleSlug', 'paidOnly', 'acRate', 'topicTags'
      ])
      writer.writeheader()
    
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        writer.writerows(clean(data))
    
    
    with open('data.jsonl', 'w', encoding='u8') as f:
      out_jsonl(f)
    
    with open('data.json', 'w', encoding='u8') as f:
      out_json(f)
    
    with open('data.csv', 'w', encoding='u8', newline='') as f:
      out_csv(f)
    
  18. Industrial Park Management Bureau of the Ministry of Economic...

    • data.gov.tw
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Industrial Parks, Ministry of Economic Affairs, Industrial Park Management Bureau of the Ministry of Economic Affairs_Statistics on Import and Export Trade Volume of Science and Technology Industrial Parks [Dataset]. https://data.gov.tw/en/datasets/25792
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Bureau of Industrial Parks, Ministry of Economic Affairs
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Provide "Statistics of Import and Export Trade Volume of Each Park" to let the public understand the import and export and its growth trend of each park. In addition to updating this information every month, CSV file format is also provided for free download and use by the public.The dataset includes statistics on the import and export trade volume of parks such as Nanzih, Kaohsiung, Taichung, Zhonggang, Pingtung, and other parks (Lingguang, Chenggong, Gaoruan), with main fields including "Park, Import and Export (This Month, Year-to-Date)", "Export (This Month, Year-to-Date)", "Import (This Month, Year-to-Date)", and other important information.

  19. h

    Initial-Data-FitMatrix

    • huggingface.co
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiro neko (2025). Initial-Data-FitMatrix [Dataset]. https://huggingface.co/datasets/Shiowo2/Initial-Data-FitMatrix
    Explore at:
    Dataset updated
    Nov 27, 2025
    Authors
    Shiro neko
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    This is The initial dataset we scraped from open maps

      this dataset has not been cleaned yet be aware!
    

    requirements

    !pip install requests

      script
    

    import csv import time import requests from urllib.parse import quote

    OUT_CSV = "jabodetabek_sports_osm.csv"

    BBOX = (-6.80, 106.30, -5.90, 107.20)

    OVERPASS_URL = "https://overpass-api.de/api/interpreter" WIKIDATA_ENTITY_URL = "https://www.wikidata.org/wiki/Special:EntityData/{qid}.json"

    FETCH_WIKIDATA_IMAGES =… See the full description on the dataset page: https://huggingface.co/datasets/Shiowo2/Initial-Data-FitMatrix.

  20. Countries by population 2021 (Worldometer)

    • kaggle.com
    zip
    Updated Aug 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem Zapara (2021). Countries by population 2021 (Worldometer) [Dataset]. https://www.kaggle.com/datasets/artemzapara/countries-by-population-2021-worldometer
    Explore at:
    zip(8163 bytes)Available download formats
    Dataset updated
    Aug 16, 2021
    Authors
    Artem Zapara
    Description

    Context

    This dataset is a clean CSV file with the most recent estimates of the population of the countries according to Wolrdometer. The data is taken from the following link: https://www.worldometers.info/world-population/population-by-country/

    Content

    The data has been generated by websraping the aforementioned link on the 16th August 2021. Below is the code used to make CSV data in Python 3.8: import requests from bs4 import BeautifulSoup import pandas as pd url = "https://www.worldometers.info/world-population/population-by-country/" r = requests.get(url) soup = BeautifulSoup(r.content) countries = soup.find_all("table")[0] dataframe = pd.read_html(str(countries))[0] dataframe.to_csv("countries_by_population_2021.csv", index=False)

    Acknowledgements

    The creation of this dataset would not be possible without a team of Worldometers, a data aggregation website.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797

Data pipeline Validation And Load Testing using Multiple CSV Files

Explore at:
Dataset updated
Mar 26, 2021
Dataset provided by
Research Fellow, University of Tartu
Masters Student, University of Tartu
Lecturer, University of Tartu
Authors
Mainak Adhikari; Afsana Khan; Pelle Jakovits
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

Search
Clear search
Close search
Google apps
Main menu