100+ datasets found

Z
Data pipeline Validation And Load Testing using Multiple CSV Files
data.niaid.nih.gov
data.europa.eu
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797
Explore at:
Dataset updated
Mar 26, 2021
Dataset provided by
Research Fellow, University of Tartu
Masters Student, University of Tartu
Lecturer, University of Tartu
Authors
Mainak Adhikari; Afsana Khan; Pelle Jakovits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Comparison of Data Import Methods: Speed, Flexibility, and Data Handling
blog.devart.com
html
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devart (2025). Comparison of Data Import Methods: Speed, Flexibility, and Data Handling [Dataset]. https://blog.devart.com/import-data-quickbooks-online-odbc-driver.html
Explore at:
htmlAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Devart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Speed, Method, Flexibility, Data Handling Capabilities
Description
A comparison of common data import methods including ODBC, CSV uploads, QuickBooks Integration API, and third-party apps, focusing on speed, flexibility, and data handling capabilities.

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

Adventure Works 2022 CSVs
kaggle.com
zip
Updated Nov 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Data from: Data and code from: Environmental influences on drying rate of...
catalog.data.gov
datasetcatalog.nlm.nih.gov
+2more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-environmental-influences-on-drying-rate-of-spray-applied-disinfestants-
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files
e
Csv Marketing Export Import Data | Eximpedia
eximpedia.app
Updated Oct 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Csv Marketing Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/csv-marketing/81787381
Explore at:
Dataset updated
Oct 17, 2025
Description
Csv Marketing Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

converted json to CSV Traffy Fondue data

kaggle.com

zip

Updated Jan 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Hansen (2025). converted json to CSV Traffy Fondue data [Dataset]. https://www.kaggle.com/datasets/motethansen/converted-json-to-csv-traffy-fondue-data

Explore at:

zip(31705770 bytes)Available download formats

Dataset updated

Jan 15, 2025

Authors

Hansen

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

Traffy Fondue Data

Data pulled from Traffy Fondue, by accessing the Traffy Fondue Open API. Date January 2022 until January 2025

The following code pulled the data:


import os
import json
import requests
from datetime import datetime, timedelta
import time

class TraffyDataFetcher:
  def _init_(self, start_date, subfolder='traffyfonduedata'):
    self.url = "https://publicapi.traffy.in.th/share/teamchadchart/search"
    self.query = {'offset': '0'}
    self.payload = {}
    self.headers = {}
    self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
    self.end_date = datetime.now()
    self.subfolder = subfolder
    self.max_requests_per_minute = 99

    if not os.path.exists(self.subfolder):
      os.makedirs(self.subfolder)

  def add_days_to_date(self, start_date_str, days_to_add):
    start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
    new_date = start_date + timedelta(days=days_to_add)
    return new_date.strftime('%Y-%m-%d')

  def fetch_data(self):
    current_date = self.start_date
    index = 0

    while current_date <= self.end_date:
      start_time = datetime.now()

      self.query['start'] = current_date.strftime('%Y-%m-%d')
      new_date = self.add_days_to_date(self.query['start'], 10)
      self.query['end'] = new_date
      response = requests.request("GET", self.url, headers=self.headers, data=self.payload, params=self.query)
      print(f"offset: {index} response: {response.status_code}")

      filename = f"traffy_{current_date.strftime('%Y-%m-%d')}.json"
      file_path = os.path.join(self.subfolder, filename)

      with open(file_path, "w") as outfile:
        json_object = json.dumps(response.json(), indent=4)
        outfile.write(json_object)

      end_time = datetime.now()
      elapsed_time = (end_time - start_time).total_seconds()
      print(f"Elapsed time: {elapsed_time} s")

      index += 950
      current_date = datetime.strptime(new_date, '%Y-%m-%d') + timedelta(days=1)

      if index % self.max_requests_per_minute == 0:
        time.sleep(60 - elapsed_time)

if _name_ == "_main_":
  fetcher = TraffyDataFetcher(start_date='2022-01-01')
  fetcher.fetch_data()

And the following code converted the json to CSV files

import os
import glob
import json
import pandas as pd
#import numpy as np

class TraffyJSONFixer:
  def _init_(self, path_to_json='*.json', subfolder='traffyfonduedata'):
    self.path_to_json = path_to_json
    self.subfolder = subfolder
    self.outputfolder = 'fixedjson'
    self.excelfolder = 'exceloutput'
    self.file_path = os.path.join(self.subfolder, self.path_to_json)
    self.json_files = glob.glob(self.file_path)
    
    # Ensure the subfolder exists
    if not os.path.exists(self.subfolder):
      os.makedirs(self.subfolder)
    # Ensure the outputfolder exists
    if not os.path.exists(self.outputfolder):
      os.makedirs(self.outputfolder)
    # Ensure the excelfolder exists
    if not os.path.exists(self.excelfolder):
      os.makedirs(self.excelfolder)
    
    # Debugging: Print the current working directory and the list of JSON files
    print(f"Current working directory: {os.getcwd()}")
    print(f"Found JSON files: {self.json_files}")
    
  def fix_json_files(self):
    for count, ele in enumerate(self.json_files):
      new_file_name = os.path.join(self.outputfolder, f"data_{os.path.basename(ele)}")
      
      try:
        with open(ele, 'r', encoding='utf-8') as f:
          data = json.load(f)

        # Debugging: Print the type of data
        print(f"Processing file: {ele}")
        print(f"Type of data: {type(data)}")
        
        # Handle different JSON structures
        if isinstance(data, dict) and "results" in data:
          results = data["results"]
        elif isinstance(data, list):
          results = data
        else:
          print(f"Unexpected JSON structure in file: {ele}")
          continue

        # Ensure results is a list or dict before writing
        if isinstance(results, (list, dict)):
          with open(new_file_name, 'w', encoding='utf-8') as f:
            f.write(json.dumps(results, indent=4))
        else:
          print(f"Unexpected type for results in file: {ele}")
      except (json.JSONDecodeError, KeyError) as e:
        print(f"Error processing file {ele}: {e}")

  def jsontoexcel(self):
    jsonfile_path = os.path.join(self.out...

c
Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format
crawlfeeds.com
csv, zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format [Dataset]. https://crawlfeeds.com/datasets/dog-food-data-extracted-from-chewy-usa-4-500-records-in-csv-format
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.

The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.

Key Features:

Record Count: 4,500 dog food product records.

Data Fields: Product names, brands, prices, descriptions, ingredients .. etc. Find more fields under data points section.

Format: CSV, easy to import into databases and data analysis tools.

Source: Extracted from Chewy’s official USA platform.

Geography: Focused on the USA dog food market.

Use Cases:

Market Research: Analyze trends and preferences in the USA dog food market, including popular brands, price ranges, and product availability.

E-commerce Analysis: Understand how Chewy presents and prices dog food products, helping businesses compare their own product offerings.

Competitor Analysis: Compare different brands and products to develop competitive strategies for dog food businesses.

Machine Learning Models: Use the dataset for machine learning tasks such as product recommendation systems, demand forecasting, and price optimization.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
CSV-data-load_data_metadata
kaggle.com
zip
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DINESH JATAV (2023). CSV-data-load_data_metadata [Dataset]. https://www.kaggle.com/datasets/dineshjatav/csv-data-load-data-metadata
Explore at:
zip(23415375 bytes)Available download formats
Dataset updated
Jun 7, 2023
Authors
DINESH JATAV
Description
Dataset

This dataset was created by DINESH JATAV

Contents
Merge number of excel file,convert into csv file
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.

Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.

File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python

Pandas

Project Implementation:

DataFrame Creation:

Import the Pandas library.

Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.

Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).

Data Manipulation:

Add new columns to the DataFrame representing derived data or computations based on existing columns.

Filter the DataFrame to include only specific rows based on certain conditions.

Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.

File Conversion:

Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.

Convert the DataFrame into a CSV (.csv) file using the to_csv() function.

Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
e
Csv Investments Private Limited Export Import Data | Eximpedia
eximpedia.app
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Csv Investments Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/csv-investments-private-limited/54413494
Explore at:
Dataset updated
Oct 8, 2025
Description
Csv Investments Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Z
Data from: Large Landing Trajectory Data Set for Go-Around Analysis
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
Explore at:
Dataset updated
Dec 16, 2022
Dataset provided by
ZHAW
Authors
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight registration string Aircraft registration typecode string Aircraft ICAO typecode icaoaircrafttype string ICAO aircraft type wtc string ICAO wake turbulence category glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometre airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) operator_country string ISO Alpha-3 country code of the operator operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) wind_speed_knts integer METAR, surface wind speed in knots wind_dir_deg integer METAR, surface wind direction in degrees wind_gust_knts integer METAR, surface wind gust speed in knots visibility_m float METAR, visibility in m temperature_deg integer METAR, temperature in degrees Celsius press_sea_level_p float METAR, sea level pressure in hPa press_p float METAR, QNH in hPA weather_intensity list METAR, list of present weather codes: qualifier - intensity weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation weather_desc list METAR, list of present weather codes: qualifier - descriptor weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed n_landings integer Total number of landings observed on this runway in 2019 ga_rate float Go-around rate, per 1000 landings glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometres airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

iterate over flights and pull the data from OpenSky Network

flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

# fetch the data from OpenSky Network flights.append( opensky.history( start=start_time.strftime("%Y-%m-%d %H:%M:%S"), stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"), callsign=row["callsign"], return_flight=True, ) )

The flights can be converted into a Traffic object

Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
UCI Diabetes Data Set
kaggle.com
zip
Updated Oct 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krittaphas Chaisutyakorn (2021). UCI Diabetes Data Set [Dataset]. https://www.kaggle.com/kritchais/uci-diabetes-data-set
Explore at:
zip(189079 bytes)Available download formats
Dataset updated
Oct 7, 2021
Authors
Krittaphas Chaisutyakorn
Description
UCI Diabetes Data Set

Diabetes patient data from UCI Machine Learning Repository

Data Set Information:

Diabetes patient records were obtained from two sources: an automatic electronic recording device and paper records. The automatic device had an internal clock to timestamp events, whereas the paper records only provided "logical time" slots (breakfast, lunch, dinner, bedtime). For paper records, fixed times were assigned to breakfast (08:00), lunch (12:00), dinner (18:00), and bedtime (22:00). Thus paper records have fictitious uniform recording times whereas electronic records have more realistic time stamps.

Diabetes files consist of four fields per record. Each field is separated by a tab and each record is separated by a newline.

File Names and format: (1) Date in MM-DD-YYYY format (2) Time in XX:YY format (3) Code (4) Value

The Code field is deciphered as follows:

33 = Regular insulin dose 34 = NPH insulin dose 35 = UltraLente insulin dose 48 = Unspecified blood glucose measurement 57 = Unspecified blood glucose measurement 58 = Pre-breakfast blood glucose measurement 59 = Post-breakfast blood glucose measurement 60 = Pre-lunch blood glucose measurement 61 = Post-lunch blood glucose measurement 62 = Pre-supper blood glucose measurement 63 = Post-supper blood glucose measurement 64 = Pre-snack blood glucose measurement 65 = Hypoglycemic symptoms 66 = Typical meal ingestion 67 = More-than-usual meal ingestion 68 = Less-than-usual meal ingestion 69 = Typical exercise activity 70 = More-than-usual exercise activity 71 = Less-than-usual exercise activity 72 = Unspecified special event

The data was extracted via this code and then was save to new csv file.

import pandas as pd def convert(file): df = pd.DataFrame(columns=['date', 'time', 'code', 'value']) with open(str(file), 'r') as data: for line in data.readlines(): line = line.replace('

', '') line_split = line.split('\t')

line_df = pd.DataFrame([line_split], columns=['date', 'time', 'code', 'value']) df = df.append(line_df) df.index = range(0, len(df)) data.close() new_file = file.parent / f'{file.name}_csv.csv' with open(new_file, 'w') as terminal_file: df.to_csv(terminal_file) terminal_file.close() print(f"{file.name} was saved.") path = 'C:/Users/Krittaphas/PycharmProjects/auto/Diabetes-Data' for file in Path(path).iterdir(): convert(file)` import pandas as pd from pathlib import Path path = 'C:/Users/Krittaphas/PycharmProjects/auto/Diabetes-Data' main_df = pd.DataFrame(columns=['patient_id', 'date', 'time', 'code', 'value']) row_list = [] for file in Path(path).iterdir(): number = file.name[5:7] df = pd.read_csv(file) df['patient_id'] = number main_df = main_df.append(df) main_df.drop('Unnamed: 0', inplace=True, axis=1) print(main_df) main_df.to_csv(path+'/diabetes_data_all_patient.csv') print('Complete')
Landmarks Dataset for sign recognition numbers
kaggle.com
zip
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
Explore at:
zip(50385 bytes)Available download formats
Dataset updated
Nov 4, 2022
Authors
Akshat Mittu
Description
This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks): mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks, connections = mp_hands.HAND_CONNECTIONS) a = dict() a['label'] = t for i in range(21): s = ('x','y','z') k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z) for j in range(len(k)): a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j] df = df.append(a,ignore_index=True)

LeetCode CN Problems

kaggle.com

zip

Updated Apr 5, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

imba-tjd (2024). LeetCode CN Problems [Dataset]. https://www.kaggle.com/datasets/imbatjd/leetcode-cn-problems/code

Explore at:

zip(530963 bytes)Available download formats

Dataset updated

Apr 5, 2024

Authors

imba-tjd

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The data was collected on 2024-04-05 containing 3492 problems.

Cleaned via the following script.

import json
import csv
from io import TextIOWrapper


def clean(data: dict):
  questions = data['data']['problemsetQuestionList']['questions']
  for q in questions:
    yield {
      'id': q['frontendQuestionId'],
      'difficulty': q['difficulty'],
      'title': q['title'],
      'titleCn': q['titleCn'],
      'titleSlug': q['titleSlug'],
      'paidOnly': q['paidOnly'],
      'acRate': round(q['acRate'], 3),
      'topicTags': [t['name'] for t in q['topicTags']],
    }


def out_jsonl(f: TextIOWrapper):
  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    for q in clean(data):
      f.write(json.dumps(q, ensure_ascii=False))
      f.write('
')


def out_json(f: TextIOWrapper):
  l = []
  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    for q in clean(data):
      l.append(q)

  json.dump(l, f, ensure_ascii=False)


def out_csv(f: TextIOWrapper):
  writer = csv.DictWriter(f, fieldnames=[
    'id', 'difficulty', 'title', 'titleCn', 'titleSlug', 'paidOnly', 'acRate', 'topicTags'
  ])
  writer.writeheader()

  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    writer.writerows(clean(data))


with open('data.jsonl', 'w', encoding='u8') as f:
  out_jsonl(f)

with open('data.json', 'w', encoding='u8') as f:
  out_json(f)

with open('data.csv', 'w', encoding='u8', newline='') as f:
  out_csv(f)

Industrial Park Management Bureau of the Ministry of Economic...
data.gov.tw
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Industrial Parks, Ministry of Economic Affairs, Industrial Park Management Bureau of the Ministry of Economic Affairs_Statistics on Import and Export Trade Volume of Science and Technology Industrial Parks [Dataset]. https://data.gov.tw/en/datasets/25792
Explore at:
csvAvailable download formats
Dataset authored and provided by
Bureau of Industrial Parks, Ministry of Economic Affairs
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provide "Statistics of Import and Export Trade Volume of Each Park" to let the public understand the import and export and its growth trend of each park. In addition to updating this information every month, CSV file format is also provided for free download and use by the public.The dataset includes statistics on the import and export trade volume of parks such as Nanzih, Kaohsiung, Taichung, Zhonggang, Pingtung, and other parks (Lingguang, Chenggong, Gaoruan), with main fields including "Park, Import and Export (This Month, Year-to-Date)", "Export (This Month, Year-to-Date)", "Import (This Month, Year-to-Date)", and other important information.
h
Initial-Data-FitMatrix
huggingface.co
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiro neko (2025). Initial-Data-FitMatrix [Dataset]. https://huggingface.co/datasets/Shiowo2/Initial-Data-FitMatrix
Explore at:
Dataset updated
Nov 27, 2025
Authors
Shiro neko
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
This is The initial dataset we scraped from open maps

this dataset has not been cleaned yet be aware!

requirements

!pip install requests

script

import csv import time import requests from urllib.parse import quote

OUT_CSV = "jabodetabek_sports_osm.csv"

BBOX = (-6.80, 106.30, -5.90, 107.20)

OVERPASS_URL = "https://overpass-api.de/api/interpreter" WIKIDATA_ENTITY_URL = "https://www.wikidata.org/wiki/Special:EntityData/{qid}.json"

FETCH_WIKIDATA_IMAGES =… See the full description on the dataset page: https://huggingface.co/datasets/Shiowo2/Initial-Data-FitMatrix.
Countries by population 2021 (Worldometer)
kaggle.com
zip
Updated Aug 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artem Zapara (2021). Countries by population 2021 (Worldometer) [Dataset]. https://www.kaggle.com/datasets/artemzapara/countries-by-population-2021-worldometer
Explore at:
zip(8163 bytes)Available download formats
Dataset updated
Aug 16, 2021
Authors
Artem Zapara
Description
Context

This dataset is a clean CSV file with the most recent estimates of the population of the countries according to Wolrdometer. The data is taken from the following link: https://www.worldometers.info/world-population/population-by-country/

Content

The data has been generated by websraping the aforementioned link on the 16th August 2021. Below is the code used to make CSV data in Python 3.8: import requests from bs4 import BeautifulSoup import pandas as pd url = "https://www.worldometers.info/world-population/population-by-country/" r = requests.get(url) soup = BeautifulSoup(r.content) countries = soup.find_all("table")[0] dataframe = pd.read_html(str(countries))[0] dataframe.to_csv("countries_by_population_2021.csv", index=False)

Acknowledgements

The creation of this dataset would not be possible without a team of Worldometers, a data aggregation website.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797

Data pipeline Validation And Load Testing using Multiple CSV Files

Explore at:

Dataset updated

Mar 26, 2021

Dataset provided by

Research Fellow, University of Tartu
Masters Student, University of Tartu
Lecturer, University of Tartu

Authors

Mainak Adhikari; Afsana Khan; Pelle Jakovits

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

Clear search

Close search

Google apps

Main menu

Data pipeline Validation And Load Testing using Multiple CSV Files

Comparison of Data Import Methods: Speed, Flexibility, and Data Handling

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

Adventure Works 2022 CSVs

Adventure Works 2022 dataset

How this Dataset is created?

How this Dataset may help you?

How to use this Dataset?

Data from: Data and code from: Environmental influences on drying rate of...

Csv Marketing Export Import Data | Eximpedia

converted json to CSV Traffy Fondue data

Traffy Fondue Data

Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format

Use Cases:

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

CSV-data-load_data_metadata

Dataset

Contents

Merge number of excel file,convert into csv file

Csv Investments Private Limited Export Import Data | Eximpedia

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

iterate over flights and pull the data from OpenSky Network

The flights can be converted into a Traffic object

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

UCI Diabetes Data Set

UCI Diabetes Data Set

Diabetes patient data from UCI Machine Learning Repository

The data was extracted via this code and then was save to new csv file.

Landmarks Dataset for sign recognition numbers

LeetCode CN Problems

Industrial Park Management Bureau of the Ministry of Economic...

Initial-Data-FitMatrix

requirements

Countries by population 2021 (Worldometer)

Context

Content

Acknowledgements

Data pipeline Validation And Load Testing using Multiple CSV Files