100+ datasets found

Python Import Data India – Buyers & Importers List
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
India
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
Open Context Database SQL Dump
data.niaid.nih.gov
zenodo.org
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kansa, Sarah Whitcher (2025). Open Context Database SQL Dump [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14728228
Explore at:
Dataset updated
Jan 23, 2025
Dataset provided by
Kansa, Eric
Kansa, Sarah Whitcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Open Context (https://opencontext.org) publishes free and open access research data for archaeology and related disciplines. An open source (but bespoke) Django (Python) application supports these data publishing services. The software repository is here: https://github.com/ekansa/open-context-py

The Open Context team runs ETL (extract, transform, load) workflows to import data contributed by researchers from various source relational databases and spreadsheets. Open Context uses PostgreSQL (https://www.postgresql.org) relational database to manage these imported data in a graph style schema. The Open Context Python application interacts with the PostgreSQL database via the Django Object-Relational-Model (ORM).

This database dump includes all published structured data organized used by Open Context (table names that start with 'oc_all_'). The binary media files referenced by these structured data records are stored elsewhere. Binary media files for some projects, still in preparation, are not yet archived with long term digital repositories.

These data comprehensively reflect the structured data currently published and publicly available on Open Context. Other data (such as user and group information) used to run the Website are not included.

IMPORTANT

This database dump contains data from roughly 190+ different projects. Each project dataset has its own metadata and citation expectations. If you use these data, you must cite each data contributor appropriately, not just this Zenodo archived database dump.
Python Import Data in December - Seair.co.in
seair.co.in
Updated Dec 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2015). Python Import Data in December - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Dec 31, 2015
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Pitcairn, Tonga, Palau, French Guiana, Bhutan, Nicaragua, Korea (Democratic People's Republic of), Guinea, Bulgaria, Mauritius
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Python Import Data in March - Seair.co.in
seair.co.in
Updated Mar 30, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in March - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Mar 30, 2016
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Lao People's Democratic Republic, Tanzania, Maldives, Bermuda, Tuvalu, Chad, French Polynesia, Germany, Cyprus, Israel
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
e
Ballroom Python South | See Full Import/Export Data | Eximpedia
eximpedia.app
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Ballroom Python South | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Eritrea, Iceland, State of, Croatia, Guyana, Luxembourg, El Salvador, Mayotte, Myanmar, Zambia
Description
Ballroom Python South Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Python Import Data in August - Seair.co.in
seair.co.in
Updated Aug 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in August - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Aug 20, 2016
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Virgin Islands (U.S.), Ecuador, Nepal, Lebanon, Gambia, Saint Pierre and Miquelon, Falkland Islands (Malvinas), Belgium, Christmas Island, South Africa
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Python package Datatable
kaggle.com
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaihua Zhang (2020). Python package Datatable [Dataset]. https://www.kaggle.com/zhangkaihua88/python-datatable/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaihua Zhang
Description
Context

This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R's data.table and attempts to mimic its core algorithms and API.

Content

The wheel file for installing datatable v0.11.0

Installation

!pip install ../input/python-datatable/datatable-0.11.0-cp37-cp37m-manylinux2010_x86_64.whl > /dev/null

Using

import datatable as dt data = dt.fread("filename").to_pandas()

Acknowledgements

https://github.com/h2oai/datatable

Documentation

https://datatable.readthedocs.io/en/latest/index.html

License

https://github.com/h2oai/datatable/blob/main/LICENSE
e
Eximpedia Export Import Trade
eximpedia.app
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 7, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Romania, Malta, Bosnia and Herzegovina, United States Minor Outlying Islands, Curaçao, Antarctica, Germany, Djibouti, Argentina, Eritrea
Description
Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
h
Python-DPO
huggingface.co
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NextWealth Entrepreneurs Private Limited (2024). Python-DPO [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2024
Dataset authored and provided by
NextWealth Entrepreneurs Private Limited
Description
Dataset Card for Python-DPO

This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.

Load with datasets

To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

ds = load_dataset("NextWealth/Python-DPO")

Data Fields

Each data instance contains:

instruction: The problem description/requirements… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.
Z
Event Data and Queries for Multi-Dimensional Event Data in the Neo4j Graph...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Apr 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahland, Dirk (2021). Event Data and Queries for Multi-Dimensional Event Data in the Neo4j Graph Database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3865221
Explore at:
Dataset updated
Apr 22, 2021
Dataset provided by
Fahland, Dirk
Esser, Stefan
Description
Data model and generic query templates for translating and integrating a set of related CSV event logs into a single event graph for as used in https://dx.doi.org/10.1007/s13740-021-00122-1

Provides input data for 5 datasets (BPIC14, BPIC15, BPIC16, BPIC17, BPIC19)

Provides Python scripts to prepare and import each dataset into a Neo4j database instance through Cypher queries, representing behavioral information not globally (as in an event log), but locally per entity and per relation between entities.

Provides Python scripts to retrieve event data from a Neo4j database instance and render it using Graphviz dot.

The data model and queries are described in detail in: Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases (2020) https://arxiv.org/abs/2005.14552 and https://dx.doi.org/10.1007/s13740-021-00122-1

Fork the query code from Github: https://github.com/multi-dimensional-process-mining/graphdb-eventlogs
Python Import Data in September - Seair.co.in
seair.co.in
Updated Sep 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in September - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2016
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Slovenia, Grenada, Djibouti, Saint Martin (French part), Holy See, Seychelles, Luxembourg, Christmas Island, Turks and Caicos Islands, Gibraltar
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Smartwatch Purchase Data
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aayush Chourasiya (2022). Smartwatch Purchase Data [Dataset]. https://www.kaggle.com/datasets/albedo0/smartwatch-purchase-data/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aayush Chourasiya
Description
Disclaimer: This is an artificially generated data using a python script based on arbitrary assumptions listed down.

The data consists of 100,000 examples of training data and 10,000 examples of test data, each representing a user who may or may not buy a smart watch.

----- Version 1 -------

trainingDataV1.csv, testDataV1.csv or trainingData.csv, testData.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. hour: The hour of the day (integer, 0-23) 1. weekend: A boolean indicating whether it is the weekend (True or False) 1. The data also includes a label for each user indicating whether they are likely to buy a smart watch or not (string, "yes" or "no"). The label is determined based on the following arbitrary conditions: - If the user is divorced and a random number generated by the script is less than 0.4, the label is "no" (i.e., assuming 40% of divorcees are not likely to buy a smart watch) - If it is the weekend and a random number generated by the script is less than 1.3, the label is "yes". (i.e., assuming sales are 30% more likely to occur on weekends) - If the user is male and under 30 with an income over 75,000, the label is "yes". - If the user is female and 30 or over with an income over 100,000, the label is "yes". Otherwise, the label is "no".

The training data is intended to be used to build and train a classification model, and the test data is intended to be used to evaluate the performance of the trained model.

Following Python script was used to generate this dataset

import random import csv # Set the number of examples to generate numExamples = 100000 # Generate the training data with open("trainingData.csv", "w", newline="") as csvfile: fieldnames = ["age", "income", "gender", "maritalStatus", "hour", "weekend", "buySmartWatch"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for i in range(numExamples): age = random.randint(18, 70) income = random.randint(25000, 200000) gender = random.choice(["male", "female"]) maritalStatus = random.choice(["single", "married", "divorced"]) hour = random.randint(0, 23) weekend = random.choice([True, False]) # Randomly assign the label based on some arbitrary conditions # assuming 40% of divorcees won't buy a smart watch if maritalStatus == "divorced" and random.random() < 0.4: buySmartWatch = "no" # assuming sales are 30% more likely to occur on weekends. elif weekend == True and random.random() < 1.3: buySmartWatch = "yes" elif gender == "male" and age < 30 and income > 75000: buySmartWatch = "yes" elif gender == "female" and age >= 30 and income > 100000: buySmartWatch = "yes" else: buySmartWatch = "no" writer.writerow({ "age": age, "income": income, "gender": gender, "maritalStatus": maritalStatus, "hour": hour, "weekend": weekend, "buySmartWatch": buySmartWatch })

----- Version 2 -------

trainingDataV2.csv, testDataV2.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. educationLevel: The education level of the user (string, "high school", "associate's degree", "bachelor's degree", "master's degree", or "doctorate") 1. occupation: The occupation of the user (string, "tech worker", "manager", "executive", "sales", "customer service", "creative", "manual labor", "healthcare", "education", "government", "unemployed", or "student") 1. familySize: The number of people in the user's family (integer, 1-5) 1. fitnessInterest: A boolean indicating whether the user is interested in fitness (True or False) 1. priorSmartwatchOwnership: A boolean indicating whether the user has owned a smartwatch in the past (True or False) 1. hour: The hour of the day when the user was surveyed (integer, 0-23) 1. weekend: A boolean indicating whether the user was surveyed on a weekend (True or False) 1. buySmartWatch: A boolean indicating whether the user purchased a smartwatch (True or False)

Python script used to generate the data:

import random import csv # Set the number of examples to generate numExamples = 100000 with open("t...
Z
Data from: Large Landing Trajectory Data Set for Go-Around Analysis
data.niaid.nih.gov
zenodo.org
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
Explore at:
Dataset updated
Dec 16, 2022
Dataset provided by
Raphael Monstein
Manuel Waltert
Timothé Krauth
Benoit Figuet
Marcel Dettling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight registration string Aircraft registration typecode string Aircraft ICAO typecode icaoaircrafttype string ICAO aircraft type wtc string ICAO wake turbulence category glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometre airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) operator_country string ISO Alpha-3 country code of the operator operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) wind_speed_knts integer METAR, surface wind speed in knots wind_dir_deg integer METAR, surface wind direction in degrees wind_gust_knts integer METAR, surface wind gust speed in knots visibility_m float METAR, visibility in m temperature_deg integer METAR, temperature in degrees Celsius press_sea_level_p float METAR, sea level pressure in hPa press_p float METAR, QNH in hPA weather_intensity list METAR, list of present weather codes: qualifier - intensity weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation weather_desc list METAR, list of present weather codes: qualifier - descriptor weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed n_landings integer Total number of landings observed on this runway in 2019 ga_rate float Go-around rate, per 1000 landings glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometres airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

iterate over flights and pull the data from OpenSky Network

flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

# fetch the data from OpenSky Network flights.append( opensky.history( start=start_time.strftime("%Y-%m-%d %H:%M:%S"), stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"), callsign=row["callsign"], return_flight=True, ) )

The flights can be converted into a Traffic object

Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Z
Geographic Diversity in Public Code Contributions — Replication Package
data.niaid.nih.gov
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Rossi (2022). Geographic Diversity in Public Code Contributions — Replication Package [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6390354
Explore at:
Dataset updated
Mar 31, 2022
Dataset provided by
Stefano Zacchiroli
Davide Rossi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geographic Diversity in Public Code Contributions - Replication Package

This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Geographic Diversity in Public Code Contributions - An Exploratory Large-Scale Study Over 50 Years. In 19th International Conference on Mining Software Repositories (MSR ’22), May 23-24, Pittsburgh, PA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3524842.3528471

This document comes with the software needed to mine and analyze the data presented in the paper.

Prerequisites

These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, …), all of which are available for multiple architectures and OSs. It is advisable to create a Python virtual environment and install the following PyPI packages:

click==8.0.4 cycler==0.11.0 fonttools==4.31.2 kiwisolver==1.4.0 matplotlib==3.5.1 numpy==1.22.3 packaging==21.3 pandas==1.4.1 patsy==0.5.2 Pillow==9.0.1 pyparsing==3.0.7 python-dateutil==2.8.2 pytz==2022.1 scipy==1.8.0 six==1.16.0 statsmodels==0.13.2

Initial data

swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/. We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030. Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.

names.tab - forenames and surnames per country with their frequency

zones.acc.tab - countries/territories, timezones, population and world zones

c_c.tab - ccTDL entities - world zones matches

Data preparation

Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst

sh> ./export.sh

Run the authors cleanup script to create authors--clean.csv.zst

sh> ./cleanup.sh authors.csv.zst

Filter out implausible names and create authors--plausible.csv.zst

sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

Zone detection by email

Run the email detection script to create author-country-by-email.tab.zst

sh> pv authors--plausible.csv.zst | zstdcat | ./guess_country_by_email.py -f 3 2> author-country-by-email.csv.log | zstdmt > author-country-by-email.tab.zst

Database creation and initial data ingestion

Create the PostgreSQL DB

sh> createdb zones-commit

Notice that from now on when prepending the psql> prompt we assume the execution of psql on the zones-commit database.

Import data into PostgreSQL DB

sh> ./import_data.sh

Zone detection by name

Extract commits data from the DB and create commits.tab, that is used as input for the zone detection script

sh> psql -f extract_commits.sql zones-commit

Run the world zone detection script to create commit_zones.tab.zst

sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.

Ingest zones assignment data into the DB

psql> \copy commit_zone from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

Extraction and graphs

Run the script to execute the queries to extract the data to plot from the DB. This creates commit_zones_7120.tab, author_zones_7120_t5.tab, commit_zones_7120.grid and author_zones_7120_t5.grid. Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, …).

sh> ./extract_data.sh

Run the script to create the graphs from all the previously extracted tabfiles.

sh> ./create_stackedbar_chart.py -w 20 -s 1971 -f commit_zones_7120.grid -f author_zones_7120_t5.grid -o chart.pdf
H
Hydroinformatics Instruction Module Example Code: Programmatic Data Access...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh (2022). Hydroinformatics Instruction Module Example Code: Programmatic Data Access with USGS Data Retrieval [Dataset]. https://www.hydroshare.org/resource/a58b5d522d7f4ab08c15cd05f3fd2ad3
Explore at:
zip(34.5 KB)Available download formats
Dataset updated
Mar 3, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity
o
Python scripts for automatically enchancing MOP entries from geonames and...
explore.openaire.eu
Updated Jan 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Effie Karuzaki (2024). Python scripts for automatically enchancing MOP entries from geonames and Ulan [Dataset]. http://doi.org/10.5281/zenodo.10532597
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10532597
Dataset updated
Jan 19, 2024
Authors
Effie Karuzaki
Description
import_missing_lat_long.py This script takes a GeoNames URL of a location, retrieves the latitude and longitude of this location from the GeoNames database and inserts these values in the corresponding Location knowledge element in the CAP. import_missing_biograpgy.py This script takes a ULAN URL of an artist, retrieves his/her biographical details from the ULAN database and inserts these details in the corresponding Person knowledge element in the CAP. import missing nationalities.py This script takes a ULAN URL of an artist, retrieves his/her nationality information from the ULAN database and inserts these details in the corresponding Person knowledge element in the CAP. import missing alt_names.py This script takes a ULAN URL of an artist, retrieves his/her alternative names by which he or she is also known from the ULAN database and inserts these details in the corresponding Person knowledge element in the CAP. Find_missing_birth_and_death_information.py This script takes a ULAN URL of an artist, retrieves his/her birth and death dates from the ULAN database and inserts these details in the corresponding Person knowledge element in the CAP.
Optiver Precomputed Features Numpy Array
kaggle.com
Updated Aug 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tal Perry (2021). Optiver Precomputed Features Numpy Array [Dataset]. https://www.kaggle.com/lighttag/optiver-precomputed-features-numpy-array
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tal Perry
Description
What's In This

This is a single numpy array with all the Optiver data joined together. It also has some of the features from this notebook It's designed to be mmapped so that you can read small pieces at once.

This is one big array with the trade and book data joined together plus some pre-computed features. The dtype of the array if fp16. The arrays shape is (n_times,n_stocks,600,27) where 600 is the max second_in_bucket and 27 is the number of columns.

How To Use It

Add the dataset to your notebook and then python import numpy as np ntimeids=3830 nstocks=112 ncolumns = 27 nseq = 600 arr = np.memmap('../input/optiver-precomputed-features-numpy-array/data.array',mode='r',dtype=np.float16,shape=(ntimeids,nstocks,600,ncolumns))

Caveats

Handling Varying Sequence Sizes

There are gaps in the stock ids and time ids, which doesn't work great with an array format. So we have time and stocks indexes as well (_ix suffix instead of _id). To calculate these:

import numpy as np import pandas as pd import numpy as np targets = pd.read_csv('/kaggle/input/optiver-realized-volatility-prediction/train.csv') ntimeids = targets.time_id.nunique() stock_ids = list(sorted(targets.stock_id.unique())) timeids = sorted(targets.time_id.unique()) timeid_to_ix = {time_id:i for i,time_id in enumerate(timeids)} stock_id_to_ix = {stock_id:i for i,stock_id in enumerate(stock_ids)}

Getting data For a particular stock id / time id

So to get the data for stock_id 13 on time_id 146 you'd do stock_ix = stock_id_to_ix[13] time_ix = timeid_to_ix[146] arr[time_ix,stock_ix]

Notice that the third dimension is of size 600 (the max number of points for a given time_ix,stock_id. Some of these will be empty. To get truncate a single stocks data do max_seq_ix = (arr[time_ix,stock_ix,:,-1]>0).cumsum().max() arr[time_ix,stock_ix,:max_seq_ix,]

Column Mappings

There are 27 columns in the last dimension these are:

['time_id', 'seconds_in_bucket', 'bid_price1', 'ask_price1', 'bid_price2', 'ask_price2', 'bid_size1', 'ask_size1', 'bid_size2', 'ask_size2', 'stock_id', 'wap1', 'wap2', 'log_return1', 'log_return2', 'wap_balance', 'price_spread', 'bid_spread', 'ask_spread', 'total_volume', 'volume_imbalance', 'price', 'size', 'order_count', 'stock_id_y', 'log_return_trade', 'target']
z
Open Context Database SQL Dump and Parquet Exports
zenodo.org
bin, zip
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Kansa; Eric Kansa; Sarah Whitcher Kansa; Sarah Whitcher Kansa (2025). Open Context Database SQL Dump and Parquet Exports [Dataset]. http://doi.org/10.5281/zenodo.15732000
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15732000
Dataset updated
Jun 25, 2025
Dataset provided by
Open Context
Authors
Eric Kansa; Eric Kansa; Sarah Whitcher Kansa; Sarah Whitcher Kansa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Open Context (https://opencontext.org) publishes free and open access research data for archaeology and related disciplines. An open source (but bespoke) Django (Python) application supports these data publishing services. The software repository is here: https://github.com/ekansa/open-context-py (the "production" branch is the one used for Open Context's primary public deployment).

We also provide a Docker based approach for installing Open Context via this code repository: https://github.com/opencontext/oc-docker (the "production" branch installs the branch of code used for Open Context's primary public deployment).

The Open Context team runs ETL (extract, transform, load) workflows to import data contributed by researchers from various source relational databases and spreadsheets. Open Context uses PostgreSQL (https://www.postgresql.org) relational database to manage these imported data in a graph style schema. The Open Context Python application interacts with the PostgreSQL database via the Django Object-Relational-Model (ORM).

This database dump includes all published structured data organized used by Open Context (table names that start with 'oc_all_'). The binary media files referenced by these structured data records are stored elsewhere. Binary media files for some projects, still in preparation, are not yet archived with long term digital repositories.

These data comprehensively reflect the structured data currently published and publicly available on Open Context. Other data (such as user and group information) used to run the Website are not included. The data are provided in a plain text SQL dump (for restoration into a version 14+ PostgreSQL database) and in the non-proprietary (but binary) parquet file format.

IMPORTANT

This database dump contains data from roughly 190+ different projects. Each project dataset has its own metadata and citation expectations. If you use these data, you must cite each data contributor appropriately, not just this Zenodo archived database dump.
k
experiment_evaluation
radar.kit.edu
radar-service.eu
tar
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mervin Seiberlich (2023). experiment_evaluation [Dataset]. http://doi.org/10.35097/1426
Explore at:
tar(89088 bytes)Available download formats
Unique identifier
https://doi.org/10.35097/1426
Dataset updated
Jun 21, 2023
Dataset provided by
Karlsruhe Institute of Technology
Authors
Mervin Seiberlich
Description
🔬️ Experiment Evaluation

Python module for the evaluation of lab experiments. The module implements functions to import meta-data of measurements, filters to search for subsets of them and routines to import and plot data from this meta-data. It works well in its original context but is currently in open alpha since it will be restructured in order to be compatible with new lab environments. Examples of its usage in scientific works will soon be published by the author that can be used to reference it. Feel free to use it for your own projects and to ask questions. For now you can cite this repository as source.

💻️ Installation

You need a running python3 installation on your OS. The module was written on Debian/GNU-Linux, was tested on Windows and should also run on other OS. It is recommended to work in an virtual environment (see the official python documentation -> from bash: python3 -m venv exp_env source exp_env/bin/activate) or conda installation.

Dependencies

Dependencies are the usual scientific modules like numpy, matplotlib, pandas but also astropy. See the requirements.txt from that you should be able to install the library with pip install pip -U # Update pip itself pip install -r /path/to/requirements.txt Alternatively you can also install the required modules from the shell etc.. The author recommends to also install jupyter that includes the interactive ipython: ```

Example via pip

pip install jupyter pip install numpy pip install matplotlib pip install scipy pip install pandas pip install astropy pip install mplcursors pip install pynufft

pip install python-slugify # make sure this version of slugify is installed and not 'slugify'

## The module itself Inside your virtual environment there is a folder `exp_env/lib/python3.../site-packages`. Place the file `experiment_evaluation.py` inside this folder (or a new sub-folder with all your personal scientific code) to make it accessible. From within your code (try it from an interactive ipython session) you should now be able to import it via:

import experiment_evaluation as ee

or from subfolder: import my_scientific_modules.experiment_evaluation as ee

### Matplotlib style In order to use the fancy custom styles (for example for consistent looking graphs throughout your publication) it is advised to use matplotlib styles. For the provided styles, copy the custom styles "thesis_default.mplstyle" etc. from the folder `stylelib` inside your matplotlib library folder: `lib/python3.9/site-packages/matplotlib/mpl-data/stylelib/*.mplstyle` # 🧑‍💻 Usage A good way to learn its usage is to have a look at the [example](examples/example_experiment_evaluation.ipynb) file. But since the module is work in progress we first explain some concepts. ## ✨ Why meta-data? The module automates several steps of experiment evaluations. But the highlight is its capability to handle experimental meta-data. This enables the user to automatically choose and plot data with a question in mind (example: plot all EQE-curves at -2V and 173Hz) instead of repeatedly choosing files manually. For calculations that need more than one measurement this becomes extremely useful but also for implementing statistics. Meta data include things like experimental settings (applied voltage on a diode, time of the measurement, temperature etc.), the experimentalist and technical informations (file-format etc., manufacturer experimental device). The module includes some generic functions but to use it for your specific lab environment you might need to add experiment and plot specific functions. ## 💾️ How to save your experiment files? In general lab measurement files stem from different devices and export routines. So frankly speaking lab-data is often a mess! But to use automatic evaluation tools some sort of system to recognize the measurement-type and store the meta-data is needed. In an ideal world a lab would decide on one file format for all measurements and labels them systematically. To include different data-types and their meta-data within one file-type there exists the *.asdf (advanced scientific data format, see their [documentation](https://asdf.readthedocs.io/en/stable/index.html) for further insight). So if you are just starting with your PhD try to use this file format everywhere ;). Also to make experiments distinguishable every experiment needs an unique identifier. So you also should number every new experiment with an increasing number and the type of the experiment. Example of useful file naming for EQE measurements: `Nr783_EQE.asdf` In the case of my PhD I decided to use what I found: store the different file formats, store them in folders with the name of the experiment and include meta-data in the file-names (bad example: `EQE/Nr783_3volt_pix1.csv`). This was not the best idea (so learn from what I learned :P) To handle that mess, this module therefore implements also some regular-expressions to extract meta-data from file-names (`ee.meta_from_filename()`), but in general it is advised to store all meta-data in the file-header (with the exception of the unique identifier and experiment type). Like this you could store your files in whatever folder structure you like and still find them from within the script. The module then imports meta-data from the files into a database and you can do fancy data-science with your data! ## 📑️ Database For calculations and filtering of datasets the meta-data and data needs to be accessible in a machine readable form. For the time being the module imports all meta-data into a pandas DataFrame that represents our database (For very large datasets this would possibly be needed to be changed). For this we have to name the root folder that includes all experiment files/folders. **Hint**: If you did not follow the unique labeling/numbering for all your experiments you can still use this module by choosing a root folder that only includes the current experiment.

from pathlib import Path measurement_root_folder = Path("/home/PhD/Data/") We can specify some pre-filtering for the specific experiment we want to evaluate:

make use of the '/' operator to build OS independant paths

measurement_folder = measurement_root_folder / "LaserLab" / "proximity-sensor" / "OPD-Lens" / "OPD-Lens_v2"

Define some pre-filter

devices = [nr for nr in range(1035, 1043)] # Unique sample numbers of the experiment listed by list-comprehension explst = "Mervin Seiberlich" Then we import the metadata into the pandas DataFrame database via `ee.list_measurements()` and call it *meta-table*: meta_table = ee.list_measurements(measurement_root_folder, devices, experimentalist=explst, sort_by=["measurement_type", "nr", "pix", "v"]) ```

💡️ Advanced note:

Internally ee.list_measurements() uses custom functions to import the experiment specific meta-data. Have a look into the source-code and search for read_meta for an example how this works in detail. With the *.asdf file-format only the generalized import function would be needed.

Import data and meta-data

To import now some measurement data for plotting we use the information inside meta_table with custom import routines and python dictionaries implementing our filters: ```

Distinguish between reference and other measurments

lens = {"nr":devices[:5]} ref = {"nr":devices[5:]}

Select by bias and compare reference samples with lens (**dict unpacks the values to combine two or mor dictionaries)

eqe_lens_0V = ee.import_eqe(meta_table, mask_dict={**lens, {"v":0}}) eqe_ref_0V = ee.import_eqe(meta_table, mask_dict={ref, **{"v":0}}) ``This yields python listseqe_lens_0V = [table1, table2, ... tableN]` with the selected data ready for plotting (Lists are maybe not smart for huge dataset and some N-dimensional object can replace this in future). Note: The tables inside the list are astropy.QTable() objects including the data and meta-data, as well as units! So with this few lines of code you already did some advanced data filtering and import!

🌡️ Physical units

The module astropy includes a submodule astropy.units. Since we deal with real world data, it is a good idea to also include units in calculations. ``` import astropy.units as u

Radius of one microlens:

r = 98 * u.um ```

📝️ Calculations

If you have to repeatedly do some advanced calculations or fits for some plots, include them as functions in the source-code. An example would be ee.pink_noise()

📊️ Plots

For plotting there exists many modules in python. Due to its grate power we use matplotlib. This comes with the cost of some complexity (definitely have a look at its documentation!). But this enables us for example to have a consistence color style, figure-size and text-size in large projects like a PhD-thesis: mpl.style.use(["thesis_default", "thesis_talk"]) # We use style-sheets to set things like figure-size and text-size, see https://matplotlib.org/stable/tutorials/introductory/customizing.html#composing-styles w,h = plt.rcParams['figure.figsize'] # get the default size for figures to scale plots accordingly In order to not invent the wheel over and over again it makes sense to wrap some plotting routines for each experiment inside some custom functions. For further detail see the documentation/recommended function signature for matplotlib specialized functions. This enables easy experiment-type specific plotting (even with statistics) once all functions are set up: ```

%% plot eqe statistics

fig, ax = plt.subplots(1,1, figsize=(w, h), layout="constrained") ee.plot_eqe(ax, eqe_lens_0V, statistics=True, color="tab:green", plot_type="EQE", marker=True, ncol=2) ee.plot_eqe(ax, eqe_ref_0V, statistics=True, color="tab:blue", plot_type="EQE", marker=True,

Facebook

Twitter

Click to copy link

Link copied

Cite

Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

.bin, .xml, .csv, .xlsAvailable download formats

Dataset provided by

Seair Exim Solutions

Authors

Seair Exim

Area covered

India

Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Clear search

Close search

Google apps

Main menu

Python Import Data India – Buyers & Importers List

Open Context Database SQL Dump

Python Import Data in December - Seair.co.in

Python Import Data in March - Seair.co.in

Ballroom Python South | See Full Import/Export Data | Eximpedia

Python Import Data in August - Seair.co.in

Storage and Transit Time Data and Code

Code information

Python package Datatable

Context

Content

Installation

Using

Acknowledgements

Documentation

License

Eximpedia Export Import Trade

Python-DPO

Event Data and Queries for Multi-Dimensional Event Data in the Neo4j Graph...

Python Import Data in September - Seair.co.in

Smartwatch Purchase Data

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

iterate over flights and pull the data from OpenSky Network

The flights can be converted into a Traffic object

Geographic Diversity in Public Code Contributions — Replication Package

Hydroinformatics Instruction Module Example Code: Programmatic Data Access...

Python scripts for automatically enchancing MOP entries from geonames and...

Optiver Precomputed Features Numpy Array

What's In This

How To Use It

Caveats

Handling Varying Sequence Sizes

Getting data For a particular stock id / time id

Column Mappings

Open Context Database SQL Dump and Parquet Exports

experiment_evaluation

🔬️ Experiment Evaluation

💻️ Installation

Dependencies

Example via pip

pip install python-slugify # make sure this version of slugify is installed and not 'slugify'

or from subfolder: import my_scientific_modules.experiment_evaluation as ee

make use of the '/' operator to build OS independant paths

Define some pre-filter

💡️ Advanced note:

Import data and meta-data

Distinguish between reference and other measurments

Select by bias and compare reference samples with lens (**dict unpacks the values to combine two or mor dictionaries)

🌡️ Physical units

Radius of one microlens:

📝️ Calculations

📊️ Plots

%% plot eqe statistics

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD