100+ datasets found

Panda White Import Data India – Buyers & Importers List
seair.co.in
Updated Sep 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2019). Panda White Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 7, 2019
Dataset provided by
Authors
Seair Exim
Area covered
India
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Panda windows USA Import & Buyer Data
seair.co.in
Updated Jul 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2018). Panda windows USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jul 20, 2018
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 16, 2025
Dataset provided by
Eximpedia PTE LTD
Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Qatar, Libya, Belgium, Guernsey, United States Minor Outlying Islands, Argentina, Guam, Equatorial Guinea, Cook Islands, Lithuania
Description
Giant Panda Service B V Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
Eximpedia PTE LTD
Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Kyrgyzstan, Uruguay, Antarctica, Finland, Togo, Niger, Kuwait, Switzerland, Belgium, Mauritania
Description
Purple Panda Fashions Limited Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
Eximpedia Export Import Trade
eximpedia.app
Updated May 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
May 16, 2025
Dataset provided by

Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Netherlands, Madagascar, Saint Lucia, Romania, Iraq, Aruba, Honduras, Tonga, Jersey, Marshall Islands
Description
Llc Panda Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 21, 2025
Dataset provided by
Eximpedia PTE LTD
Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Cocos (Keeling) Islands, Oman, Trinidad and Tobago, French Southern Territories, Indonesia, Cayman Islands, Namibia, Kiribati, Zimbabwe, Saint Vincent and the Grenadines
Description
Llc Panda Vetservis Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
s
Broadway Panda Cafe Inc Importer/Buyer Data in USA, Broadway Panda Cafe Inc...
seair.co.in
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Broadway Panda Cafe Inc Importer/Buyer Data in USA, Broadway Panda Cafe Inc Imports Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Apr 15, 2025
Dataset provided by
Seair Info Solutions PVT LTD
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
To Order Panda Electronics Imp Import Shipments, Overseas Suppliers
volza.com
csv
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). To Order Panda Electronics Imp Import Shipments, Overseas Suppliers [Dataset]. https://www.volza.com/us-importers/to-order-panda-electronics-imp-3103321.aspx
Explore at:
csvAvailable download formats
Dataset updated
Jul 14, 2025
Dataset provided by
Authors
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2014 - Sep 30, 2021
Variables measured
Count of exporters, Count of importers, Sum of export value, Count of import shipments
Description
Find out import shipments and details about To Order Panda Electronics Imp Import Data report along with address, suppliers, products and import shipments.
g
Dataset with four years of condition monitoring technical language...
gimi9.com
Updated Jan 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-hafd-ms27/
Explore at:
Dataset updated
Jan 8, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Sweden
Description
This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title. Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']
h
regression
huggingface.co
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maryahmink (2025). regression [Dataset]. https://huggingface.co/datasets/MaryahGreene/regression
Explore at:
Dataset updated
Mar 7, 2025
Authors
maryahmink
Description
Import pandas as pd

import pandas as pd import json data = [ ("Today around 11:30am I realized that I've only ever worked in JavaScript and am wondering if I'm cooked...", 7), ("Today I learned how to check in. I'll be working on fixing my CoLab and starting to work on any datasets missed", 6), ("Today I finally figured out how to set up GitHub repositories and push my first project...", 8), ("Today I decided to dive into Python’s object-oriented programming...", 8)… See the full description on the dataset page: https://huggingface.co/datasets/MaryahGreene/regression.
V
Panda Logistics Usa New York Import Shipments, Overseas Suppliers
volza.com
csv
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Panda Logistics Usa New York Import Shipments, Overseas Suppliers [Dataset]. https://www.volza.com/us-importers/panda-logistics-usa-new-york-151020.aspx
Explore at:
csvAvailable download formats
Dataset updated
Aug 29, 2025
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2014 - Sep 30, 2021
Area covered
New York, United States
Variables measured
Count of exporters, Count of importers, Sum of export value, Count of import shipments
Description
Find out import shipments and details about Panda Logistics Usa New York Import Data report along with address, suppliers, products and import shipments.
Panda Trade And Manufacturing Inc Importer/Buyer Data in USA, Panda Trade...
seair.co.in
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Panda Trade And Manufacturing Inc Importer/Buyer Data in USA, Panda Trade And Manufacturing Inc Imports Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jun 19, 2025
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Find details of Panda Trade And Manufacturing Inc Buyer/importer data in US (United States) with product description, price, shipment date, quantity, imported products list, major us ports name, overseas suppliers/exporters name etc. at sear.co.in.
Diabetes_Dataset_1.1
kaggle.com
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KIRANMAYI G 777 (2023). Diabetes_Dataset_1.1 [Dataset]. https://www.kaggle.com/datasets/kiranmayig777/diabetes-dataset-1-1/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KIRANMAYI G 777
Description
import pandas as pd import numpy as np

PERFORMING EDA

data.head() data.info()

attributes_data = data.iloc[:, 1:] attributes_data

attributes_data.describe() attributes_data.corr()

import seaborn as sns import matplotlib.pyplot as plt

Calculate correlation matrix

correlation_matrix = attributes_data.corr() plt.figure(figsize=(18, 10))

Create a heatmap

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm') plt.show()

CHECKING IF DATASET IS LINEAR OR NON-LINEAR

Calculate correlations between target and predictor columns

correlations = data.corr()['Diabetes_binary'].drop('Diabetes_binary')

Create a bar chart

plt.figure(figsize=(10, 6)) correlations.plot(kind='bar') plt.xlabel('Predictor Columns') plt.ylabel('Correlation values') plt.title('Correlation between Diabetes_binary and Predictors') plt.show()

CHECKING FOR NULL AND MISSING VALUES, CLEANING THEM

Count the number of null values in each column

print(data.isnull().sum())

to check for missing values in all columns

print(data.isna().sum())

LASSO import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV, KFold

X = data.iloc[:, 1:] y = data.iloc[:, 0] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

gridsearchcv is used to find the optimal combination of hyperparameters for a given model

So, in the end, we can select the best parameters from the listed hyperparameters.

parameters = {"alpha": np.arange(0.00001, 10, 500)}
kfold = KFold(n_splits = 10, shuffle=True, random_state = 42) lassoReg = Lasso() lasso_cv = GridSearchCV(lassoReg, param_grid = parameters, cv = kfold) lasso_cv.fit(X, y) print("Best Params {}".format(lasso_cv.best_params_))

column_names = list(data) column_names = column_names[1:] column_names

lassoModel = Lasso(alpha = 0.00001) lassoModel.fit(X_train, y_train) lasso_coeff = np.abs(lassoModel.coef_)#making all coefficients positive plt.bar(column_names, lasso_coeff, color = 'orange') plt.xticks(rotation=90) plt.grid() plt.title("Feature Selection Based on Lasso") plt.xlabel("Features") plt.ylabel("Importance") plt.ylim(0, 0.16) plt.show()

RFE from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

from sklearn.feature_selection import RFECV from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() rfecv = RFECV(estimator= model, step = 1, cv = 20, scoring="accuracy") rfecv = rfecv.fit(X_train, y_train)

num_features_selected = len(rfecv.rankin_)

Cross-validation scores

cv_scores = rfecv.ranking_

Plotting the number of features vs. cross-validation score

plt.figure(figsize=(10, 6)) plt.xlabel("Number of features selected") plt.ylabel("Score (accuracy)") plt.plot(range(1, num_features_selected + 1), cv_scores, marker='o', color='r') plt.xticks(range(1, num_features_selected + 1)) # Set x-ticks to integers plt.grid() plt.title("RFECV: Number of Features vs. Score(accuracy)") plt.show()

print("The optimal number of features:", rfecv.n_features_) print("Best features:", X_train.columns[rfecv.support_])

PCA import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler

X = data.drop(["Diabetes_binary"], axis=1) y = data["Diabetes_binary"]

df1=pd.DataFrame(data = data,columns=data.columns) print(df1)

scaling=StandardScaler() scaling.fit(df1) Scaled_data=scaling.transform(df1) principal=PCA(n_components=3) principal.fit(Scaled_data) x=principal.transform(Scaled_data) print(x.shape)

principal.components_

plt.figure(figsize=(10,10))

plt.scatter(x[:,0],x[:,1],c=data['Diabetes_binary'],cmap='plasma') plt.xlabel('pc1') plt.ylabel('pc2')

print(principal.explained_variance_ratio_)

T-SNE from sklearn.manifold import TSNE from numpy import reshape import seaborn as sns

tsne = TSNE(n_components=3, verbose=1, random_state=42) z = tsne.fit_transform(X)

df = pd.DataFrame() df["y"] = y df["comp-1"] = z[:,0] df["comp-2"] = z[:,1] df["comp-3"] = z[:,2] sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(), palette=sns.color_palette("husl", 2), data=df).set(title="Diabetes data T-SNE projection")
h
roots-tsne-data
huggingface.co
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Akiki (2023). roots-tsne-data [Dataset]. https://huggingface.co/datasets/christopher/roots-tsne-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Authors
Christopher Akiki
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
What follows is research code. It is by no means optimized for speed, efficiency, or readability.

Data loading, tokenizing and sharding

import os import numpy as np import pandas as pd from sklearn.feature_extraction.text import TfidfTransformer from sklearn.decomposition import TruncatedSVD from tqdm.notebook import tqdm from openTSNE import TSNE import datashader as ds import colorcet as cc

fromdask.distributed import Client import dask.dataframe as dd import dask_ml import… See the full description on the dataset page: https://huggingface.co/datasets/christopher/roots-tsne-data.
ONE DATA Data Sience Workflows
zenodo.org
json
Updated Sep 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer; Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer (2021). ONE DATA Data Sience Workflows [Dataset]. http://doi.org/10.5281/zenodo.4633704
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4633704
Dataset updated
Sep 17, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer; Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ONE DATA data science workflow dataset ODDS-full comprises 815 unique workflows in temporally ordered versions.
A version of a workflow describes its evolution over time, so whenever a workflow is altered meaningfully, a new version of this respective workflow is persisted.
Overall, 16035 versions are available.

The ODDS-full workflows represent machine learning workflows expressed as node-heterogeneous DAGs with 156 different node types.
These node types represent various kinds of processing steps of a general machine learning workflow and are grouped into 5 categories, which are listed below.

Load Processors for loading or generating data (e.g. via a random number generator).

Save Processors for persisting data (possible in various data formats, via external connections or as a contained result within the ONE DATA platform) or for providing data to other places as a service.

Transformation Processors for altering and adapting data. This includes e.g. database-like operations such as renaming columns or joining tables as well as fully fledged dataset queries.

Quantitative Methods Various aggregation or correlation analysis, bucketing, and simple forecasting.

Advanced Methods Advanced machine learning algorithms such as BNN or Linear Regression. Also includes special meta processors that for example allow the execution of external workflows within the original workflow.

Any metadata beyond the structure and node types of a workflow has been removed for anonymization purposes

ODDS, a filtered variant, which enforces weak connectedness and only contains workflows with at least 5 different versions and 5 nodes, is available as the default version for supervised and unsupvervised learning.

Workflows are served as JSON node-link graphs via networkx.

They can be loaded into python as follows:

import pandas as pd import networkx as nx import json with open('ODDS.json', 'r') as f: graphs = pd.Series(list(map(nx.node_link_graph, json.load(f)['graphs'])))
V
Panda Power Funds Sunbury Hummel S Import Shipments, Overseas Suppliers
volza.com
csv
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Panda Power Funds Sunbury Hummel S Import Shipments, Overseas Suppliers [Dataset]. https://www.volza.com/us-importers/panda-power-funds-sunbury-hummel-s-4098050.aspx
Explore at:
csvAvailable download formats
Dataset updated
Aug 29, 2025
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2014 - Sep 30, 2021
Variables measured
Count of exporters, Count of importers, Sum of export value, Count of import shipments
Description
Find out import shipments and details about Panda Power Funds Sunbury Hummel S Import Data report along with address, suppliers, products and import shipments.
Optiver Precomputed Features Numpy Array
kaggle.com
Updated Aug 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tal Perry (2021). Optiver Precomputed Features Numpy Array [Dataset]. https://www.kaggle.com/lighttag/optiver-precomputed-features-numpy-array
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tal Perry
Description
What's In This

This is a single numpy array with all the Optiver data joined together. It also has some of the features from this notebook It's designed to be mmapped so that you can read small pieces at once.

This is one big array with the trade and book data joined together plus some pre-computed features. The dtype of the array if fp16. The arrays shape is (n_times,n_stocks,600,27) where 600 is the max second_in_bucket and 27 is the number of columns.

How To Use It

Add the dataset to your notebook and then python import numpy as np ntimeids=3830 nstocks=112 ncolumns = 27 nseq = 600 arr = np.memmap('../input/optiver-precomputed-features-numpy-array/data.array',mode='r',dtype=np.float16,shape=(ntimeids,nstocks,600,ncolumns))

Caveats

Handling Varying Sequence Sizes

There are gaps in the stock ids and time ids, which doesn't work great with an array format. So we have time and stocks indexes as well (_ix suffix instead of _id). To calculate these:

import numpy as np import pandas as pd import numpy as np targets = pd.read_csv('/kaggle/input/optiver-realized-volatility-prediction/train.csv') ntimeids = targets.time_id.nunique() stock_ids = list(sorted(targets.stock_id.unique())) timeids = sorted(targets.time_id.unique()) timeid_to_ix = {time_id:i for i,time_id in enumerate(timeids)} stock_id_to_ix = {stock_id:i for i,stock_id in enumerate(stock_ids)}

Getting data For a particular stock id / time id

So to get the data for stock_id 13 on time_id 146 you'd do stock_ix = stock_id_to_ix[13] time_ix = timeid_to_ix[146] arr[time_ix,stock_ix]

Notice that the third dimension is of size 600 (the max number of points for a given time_ix,stock_id. Some of these will be empty. To get truncate a single stocks data do max_seq_ix = (arr[time_ix,stock_ix,:,-1]>0).cumsum().max() arr[time_ix,stock_ix,:max_seq_ix,]

Column Mappings

There are 27 columns in the last dimension these are:

['time_id', 'seconds_in_bucket', 'bid_price1', 'ask_price1', 'bid_price2', 'ask_price2', 'bid_size1', 'ask_size1', 'bid_size2', 'ask_size2', 'stock_id', 'wap1', 'wap2', 'log_return1', 'log_return2', 'wap_balance', 'price_spread', 'bid_spread', 'ask_spread', 'total_volume', 'volume_imbalance', 'price', 'size', 'order_count', 'stock_id_y', 'log_return_trade', 'target']
Crime in Russia (2003-2020)
kaggle.com
Updated Apr 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Tsarkov (2020). Crime in Russia (2003-2020) [Dataset]. https://www.kaggle.com/tsarkov90/crime-in-russia-20032020/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2020
Dataset provided by
Kaggle
Authors
Nikita Tsarkov
Area covered
Russia
Description
This dataset contains information about the state of crime in Russia since 2003. The table shows the number of crimes of various types for each month.

Your task (or desire) is to analyze the data and forecast the crimes number for next month or year. Good luck and peace🙌

Этот датасет содержит информацию о состоянии преступности в России за каждый месяц, начиная с 2003 года.

Ваша задача - проанализировать данные и попробовать предсказать число преступлений на следующий месяц или год.

Import: python import pandas as pd data = pd.read_csv('crime.csv', parse_dates=['month'], index_col=['month'], dayfirst=True)
Z
Data from: Large Landing Trajectory Data Set for Go-Around Analysis
data.niaid.nih.gov
zenodo.org
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
Explore at:
Dataset updated
Dec 16, 2022
Dataset provided by
Timothé Krauth
Raphael Monstein
Marcel Dettling
Benoit Figuet
Manuel Waltert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight registration string Aircraft registration typecode string Aircraft ICAO typecode icaoaircrafttype string ICAO aircraft type wtc string ICAO wake turbulence category glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometre airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) operator_country string ISO Alpha-3 country code of the operator operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) wind_speed_knts integer METAR, surface wind speed in knots wind_dir_deg integer METAR, surface wind direction in degrees wind_gust_knts integer METAR, surface wind gust speed in knots visibility_m float METAR, visibility in m temperature_deg integer METAR, temperature in degrees Celsius press_sea_level_p float METAR, sea level pressure in hPa press_p float METAR, QNH in hPA weather_intensity list METAR, list of present weather codes: qualifier - intensity weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation weather_desc list METAR, list of present weather codes: qualifier - descriptor weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed n_landings integer Total number of landings observed on this runway in 2019 ga_rate float Go-around rate, per 1000 landings glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometres airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

iterate over flights and pull the data from OpenSky Network

flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

# fetch the data from OpenSky Network flights.append( opensky.history( start=start_time.strftime("%Y-%m-%d %H:%M:%S"), stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"), callsign=row["callsign"], return_flight=True, ) )

The flights can be converted into a Traffic object

Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seair Exim (2019). Panda White Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in

Panda White Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:

23 scholarly articles cite this dataset (View in Google Scholar)

.bin, .xml, .csv, .xlsAvailable download formats

Dataset updated

Sep 7, 2019

Dataset provided by

Authors

Seair Exim

Area covered

India

Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Clear search

Close search

Google apps

Main menu

Panda White Import Data India – Buyers & Importers List

Panda windows USA Import & Buyer Data

Eximpedia Export Import Trade

Eximpedia Export Import Trade

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Eximpedia Export Import Trade

Eximpedia Export Import Trade

Broadway Panda Cafe Inc Importer/Buyer Data in USA, Broadway Panda Cafe Inc...

To Order Panda Electronics Imp Import Shipments, Overseas Suppliers

Dataset with four years of condition monitoring technical language...

regression

Panda Logistics Usa New York Import Shipments, Overseas Suppliers

Panda Trade And Manufacturing Inc Importer/Buyer Data in USA, Panda Trade...

Diabetes_Dataset_1.1

Calculate correlation matrix

Create a heatmap

Calculate correlations between target and predictor columns

Create a bar chart

Count the number of null values in each column

to check for missing values in all columns

gridsearchcv is used to find the optimal combination of hyperparameters for a given model

So, in the end, we can select the best parameters from the listed hyperparameters.

Cross-validation scores

Plotting the number of features vs. cross-validation score

plt.figure(figsize=(10,10))

roots-tsne-data

ONE DATA Data Sience Workflows

Panda Power Funds Sunbury Hummel S Import Shipments, Overseas Suppliers

Optiver Precomputed Features Numpy Array

What's In This

How To Use It

Caveats

Handling Varying Sequence Sizes

Getting data For a particular stock id / time id

Column Mappings

Crime in Russia (2003-2020)

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

iterate over flights and pull the data from OpenSky Network

The flights can be converted into a Traffic object

Panda White Import Data India – Buyers & Importers ListSee More Versions

Seair Exim Solutions

Seair Info Solutions PVT LTD

Panda White Import Data India – Buyers & Importers List