100+ datasets found

Python Import Data India – Buyers & Importers List
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Info Solutions PVT
Authors
Seair Exim
Area covered
India
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Python Import Data in December - Seair.co.in
seair.co.in
Updated Dec 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2015). Python Import Data in December - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Dec 31, 2015
Dataset provided by
Seair Info Solutions PVT
Authors
Seair Exim
Area covered
Pitcairn, Guinea, Korea (Democratic People's Republic of), Bulgaria, Bhutan, Nicaragua, French Guiana, Tonga, Mauritius, Palau
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Storage and Transit Time Data and Code
zenodo.org
zip
Updated Oct 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14009758
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. Felton
Date: 10/29/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
z
Open Context Database SQL Dump
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Kansa; Eric Kansa; Sarah Whitcher Kansa; Sarah Whitcher Kansa (2025). Open Context Database SQL Dump [Dataset]. http://doi.org/10.5281/zenodo.14728229
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14728229
Dataset updated
Jan 23, 2025
Dataset provided by
Open Context
Authors
Eric Kansa; Eric Kansa; Sarah Whitcher Kansa; Sarah Whitcher Kansa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Open Context (https://opencontext.org) publishes free and open access research data for archaeology and related disciplines. An open source (but bespoke) Django (Python) application supports these data publishing services. The software repository is here: https://github.com/ekansa/open-context-py

The Open Context team runs ETL (extract, transform, load) workflows to import data contributed by researchers from various source relational databases and spreadsheets. Open Context uses PostgreSQL (https://www.postgresql.org) relational database to manage these imported data in a graph style schema. The Open Context Python application interacts with the PostgreSQL database via the Django Object-Relational-Model (ORM).

This database dump includes all published structured data organized used by Open Context (table names that start with 'oc_all_'). The binary media files referenced by these structured data records are stored elsewhere. Binary media files for some projects, still in preparation, are not yet archived with long term digital repositories.

These data comprehensively reflect the structured data currently published and publicly available on Open Context. Other data (such as user and group information) used to run the Website are not included.

IMPORTANT

This database dump contains data from roughly 190+ different projects. Each project dataset has its own metadata and citation expectations. If you use these data, you must cite each data contributor appropriately, not just this Zenodo archived database dump.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Jan 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
Eximpedia PTE LTD
Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Hungary, Bahrain, Jordan, Burundi, Mali, Vanuatu, Senegal, Switzerland, Malaysia, Cook Islands
Description
Python Logistics Llc Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Python Import Data in March - Seair.co.in
seair.co.in
Updated Mar 30, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in March - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Mar 30, 2016
Dataset provided by
Seair Info Solutions PVT
Authors
Seair Exim
Area covered
Cyprus, Tuvalu, Lao People's Democratic Republic, French Polynesia, Chad, Tanzania, Israel, Bermuda, Maldives, Germany
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
e
Ballroom Python South | See Full Import/Export Data | Eximpedia
eximpedia.app
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Ballroom Python South | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Eximpedia PTE LTD
Eximpedia Export Import Trade Data
Authors
Seair Exim
Area covered
Mayotte, Guyana, Luxembourg, Croatia, El Salvador, State of, Eritrea, Iceland, Myanmar, Zambia
Description
Ballroom Python South Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Smartwatch Purchase Data
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aayush Chourasiya (2022). Smartwatch Purchase Data [Dataset]. https://www.kaggle.com/datasets/albedo0/smartwatch-purchase-data/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aayush Chourasiya
Description
Disclaimer: This is an artificially generated data using a python script based on arbitrary assumptions listed down.

The data consists of 100,000 examples of training data and 10,000 examples of test data, each representing a user who may or may not buy a smart watch.

----- Version 1 -------

trainingDataV1.csv, testDataV1.csv or trainingData.csv, testData.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. hour: The hour of the day (integer, 0-23) 1. weekend: A boolean indicating whether it is the weekend (True or False) 1. The data also includes a label for each user indicating whether they are likely to buy a smart watch or not (string, "yes" or "no"). The label is determined based on the following arbitrary conditions: - If the user is divorced and a random number generated by the script is less than 0.4, the label is "no" (i.e., assuming 40% of divorcees are not likely to buy a smart watch) - If it is the weekend and a random number generated by the script is less than 1.3, the label is "yes". (i.e., assuming sales are 30% more likely to occur on weekends) - If the user is male and under 30 with an income over 75,000, the label is "yes". - If the user is female and 30 or over with an income over 100,000, the label is "yes". Otherwise, the label is "no".

The training data is intended to be used to build and train a classification model, and the test data is intended to be used to evaluate the performance of the trained model.

Following Python script was used to generate this dataset

import random import csv # Set the number of examples to generate numExamples = 100000 # Generate the training data with open("trainingData.csv", "w", newline="") as csvfile: fieldnames = ["age", "income", "gender", "maritalStatus", "hour", "weekend", "buySmartWatch"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for i in range(numExamples): age = random.randint(18, 70) income = random.randint(25000, 200000) gender = random.choice(["male", "female"]) maritalStatus = random.choice(["single", "married", "divorced"]) hour = random.randint(0, 23) weekend = random.choice([True, False]) # Randomly assign the label based on some arbitrary conditions # assuming 40% of divorcees won't buy a smart watch if maritalStatus == "divorced" and random.random() < 0.4: buySmartWatch = "no" # assuming sales are 30% more likely to occur on weekends. elif weekend == True and random.random() < 1.3: buySmartWatch = "yes" elif gender == "male" and age < 30 and income > 75000: buySmartWatch = "yes" elif gender == "female" and age >= 30 and income > 100000: buySmartWatch = "yes" else: buySmartWatch = "no" writer.writerow({ "age": age, "income": income, "gender": gender, "maritalStatus": maritalStatus, "hour": hour, "weekend": weekend, "buySmartWatch": buySmartWatch })

----- Version 2 -------

trainingDataV2.csv, testDataV2.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. educationLevel: The education level of the user (string, "high school", "associate's degree", "bachelor's degree", "master's degree", or "doctorate") 1. occupation: The occupation of the user (string, "tech worker", "manager", "executive", "sales", "customer service", "creative", "manual labor", "healthcare", "education", "government", "unemployed", or "student") 1. familySize: The number of people in the user's family (integer, 1-5) 1. fitnessInterest: A boolean indicating whether the user is interested in fitness (True or False) 1. priorSmartwatchOwnership: A boolean indicating whether the user has owned a smartwatch in the past (True or False) 1. hour: The hour of the day when the user was surveyed (integer, 0-23) 1. weekend: A boolean indicating whether the user was surveyed on a weekend (True or False) 1. buySmartWatch: A boolean indicating whether the user purchased a smartwatch (True or False)

Python script used to generate the data:

import random import csv # Set the number of examples to generate numExamples = 100000 with open("t...
Python Import Data in September - Seair.co.in
seair.co.in
Updated Sep 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in September - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2016
Dataset provided by
Seair Info Solutions PVT
Authors
Seair Exim
Area covered
Grenada, Luxembourg, Slovenia, Djibouti, Saint Martin (French part), Turks and Caicos Islands, Christmas Island, Holy See, Seychelles, Gibraltar
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
m
Customers order for a Printing Company (2D Bin Packing and Scheduling)
data.mendeley.com
Updated Dec 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mahdi mostajabdaveh (2021). Customers order for a Printing Company (2D Bin Packing and Scheduling) [Dataset]. http://doi.org/10.17632/bxh46tps75.5
Explore at:
Unique identifier
https://doi.org/10.17632/bxh46tps75.5
Dataset updated
Dec 30, 2021
Authors
mahdi mostajabdaveh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data belongs to an actual printing company . Each record in Excel file Raw Data/Big_Data present an order from customers. In column "ColorMode" ; 4+0 means the order is one sided and 4+4 means it is two-sided. Files in Instances folder correspond to the instances used for computational tests in the article. Each of these instances has two related file with the same characteristics. One with gdx suffix and one with out any file extension.

Files with gdx suffix can be read by GAMS

Files without suffix are imported by pickle package in Python as objects of class Input (defined in "Input.py" ). You can read the files using the pickle package and Input.py. More information on pickle package at docs.python.org/3/library/pickle

These files are used to import data to the python implementation. The code and relevant description can be found in Read_input.py file.
Python package Datatable
kaggle.com
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaihua Zhang (2020). Python package Datatable [Dataset]. https://www.kaggle.com/zhangkaihua88/python-datatable/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaihua Zhang
Description
Context

This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R's data.table and attempts to mimic its core algorithms and API.

Content

The wheel file for installing datatable v0.11.0

Installation

!pip install ../input/python-datatable/datatable-0.11.0-cp37-cp37m-manylinux2010_x86_64.whl > /dev/null

Using

import datatable as dt data = dt.fread("filename").to_pandas()

Acknowledgements

https://github.com/h2oai/datatable

Documentation

https://datatable.readthedocs.io/en/latest/index.html

License

https://github.com/h2oai/datatable/blob/main/LICENSE
Stage One Experiment - Datasets
figshare.com
bin
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Yerbury (2025). Stage One Experiment - Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27427155.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27427155.v1
Dataset updated
Jan 21, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luke Yerbury
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used in the stage one 1NN classification experiment in: "Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance"All datasets are stored in a dict with tuples of (time series array, class labels). To access data in python:import picklefilename = "dataset.txt"with open(filename, 'rb') as f: data = pickle.load(f)
h
Python-DPO
huggingface.co
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NextWealth Entrepreneurs Private Limited (2024). Python-DPO [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2024
Dataset authored and provided by
NextWealth Entrepreneurs Private Limited
Description
Dataset Card for Python-DPO

This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.

Load with datasets

To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

ds = load_dataset("NextWealth/Python-DPO")

Data Fields

Each data instance contains:

instruction: The problem description/requirements… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.
Python Import Data in August - Seair.co.in
seair.co.in
Updated Aug 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in August - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Aug 20, 2016
Dataset provided by
Seair Info Solutions PVT
Authors
Seair Exim
Area covered
Nepal, Belgium, Christmas Island, Saint Pierre and Miquelon, Falkland Islands (Malvinas), Lebanon, Gambia, Virgin Islands (U.S.), Ecuador, South Africa
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
Data from: Russian Financial Statements Database: A firm-level collection of...
data.niaid.nih.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skougarevskiy, Dmitriy (2025). Russian Financial Statements Database: A firm-level collection of the universe of financial statements [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14622208
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Skougarevskiy, Dmitriy
Ledenev, Victor
Bondarkov, Sergey
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Russia
Description
The Russian Financial Statements Database (RFSD) is an open, harmonized collection of annual unconsolidated financial statements of the universe of Russian firms:

🔓 First open data set with information on every active firm in Russia.

🗂️ First open financial statements data set that includes non-filing firms.

🏛️ Sourced from two official data providers: the Rosstat and the Federal Tax Service.

📅 Covers 2011-2023 initially, will be continuously updated.

🏗️ Restores as much data as possible through non-invasive data imputation, statement articulation, and harmonization.

The RFSD is hosted on 🤗 Hugging Face and Zenodo and is stored in a structured, column-oriented, compressed binary format Apache Parquet with yearly partitioning scheme, enabling end-users to query only variables of interest at scale.

The accompanying paper provides internal and external validation of the data: http://arxiv.org/abs/2501.05841.

Here we present the instructions for importing the data in R or Python environment. Please consult with the project repository for more information: http://github.com/irlcode/RFSD.

Importing The Data

You have two options to ingest the data: download the .parquet files manually from Hugging Face or Zenodo or rely on 🤗 Hugging Face Datasets library.

Python

🤗 Hugging Face Datasets

It is as easy as:

from datasets import load_dataset import polars as pl

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

RFSD = load_dataset('irlspbru/RFSD')

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

RFSD_2023 = pl.read_parquet('hf://datasets/irlspbru/RFSD/RFSD/year=2023/*.parquet')

Please note that the data is not shuffled within year, meaning that streaming first n rows will not yield a random sample.

Local File Import

Importing in Python requires pyarrow package installed.

import pyarrow.dataset as ds import polars as pl

Read RFSD metadata from local file

RFSD = ds.dataset("local/path/to/RFSD")

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

print(RFSD.schema)

Load full dataset into memory

RFSD_full = pl.from_arrow(RFSD.to_table())

Load only 2019 data into memory

RFSD_2019 = pl.from_arrow(RFSD.to_table(filter=ds.field('year') == 2019))

Load only revenue for firms in 2019, identified by taxpayer id

RFSD_2019_revenue = pl.from_arrow( RFSD.to_table( filter=ds.field('year') == 2019, columns=['inn', 'line_2110'] ) )

Give suggested descriptive names to variables

renaming_df = pl.read_csv('local/path/to/descriptive_names_dict.csv') RFSD_full = RFSD_full.rename({item[0]: item[1] for item in zip(renaming_df['original'], renaming_df['descriptive'])})

R

Local File Import

Importing in R requires arrow package installed.

library(arrow) library(data.table)

Read RFSD metadata from local file

RFSD <- open_dataset("local/path/to/RFSD")

Use schema() to glimpse into the data structure and column classes

schema(RFSD)

Load full dataset into memory

scanner <- Scanner$create(RFSD) RFSD_full <- as.data.table(scanner$ToTable())

Load only 2019 data into memory

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scanner <- scan_builder$Finish() RFSD_2019 <- as.data.table(scanner$ToTable())

Load only revenue for firms in 2019, identified by taxpayer id

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scan_builder$Project(cols = c("inn", "line_2110")) scanner <- scan_builder$Finish() RFSD_2019_revenue <- as.data.table(scanner$ToTable())

Give suggested descriptive names to variables

renaming_dt <- fread("local/path/to/descriptive_names_dict.csv") setnames(RFSD_full, old = renaming_dt$original, new = renaming_dt$descriptive)

Use Cases

🌍 For macroeconomists: Replication of a Bank of Russia study of the cost channel of monetary policy in Russia by Mogiliat et al. (2024) — interest_payments.md

🏭 For IO: Replication of the total factor productivity estimation by Kaukin and Zhemkova (2023) — tfp.md

🗺️ For economic geographers: A novel model-less house-level GDP spatialization that capitalizes on geocoding of firm addresses — spatialization.md

FAQ

Why should I use this data instead of Interfax's SPARK, Moody's Ruslana, or Kontur's Focus?hat is the data period?

To the best of our knowledge, the RFSD is the only open data set with up-to-date financial statements of Russian companies published under a permissive licence. Apart from being free-to-use, the RFSD benefits from data harmonization and error detection procedures unavailable in commercial sources. Finally, the data can be easily ingested in any statistical package with minimal effort.

What is the data period?

We provide financials for Russian firms in 2011-2023. We will add the data for 2024 by July, 2025 (see Version and Update Policy below).

Why are there no data for firm X in year Y?

Although the RFSD strives to be an all-encompassing database of financial statements, end users will encounter data gaps:

We do not include financials for firms that we considered ineligible to submit financial statements to the Rosstat/Federal Tax Service by law: financial, religious, or state organizations (state-owned commercial firms are still in the data).

Eligible firms may enjoy the right not to disclose under certain conditions. For instance, Gazprom did not file in 2022 and we had to impute its 2022 data from 2023 filings. Sibur filed only in 2023, Novatek — in 2020 and 2021. Commercial data providers such as Interfax's SPARK enjoy dedicated access to the Federal Tax Service data and therefore are able source this information elsewhere.

Firm may have submitted its annual statement but, according to the Uniform State Register of Legal Entities (EGRUL), it was not active in this year. We remove those filings.

Why is the geolocation of firm X incorrect?

We use Nominatim to geocode structured addresses of incorporation of legal entities from the EGRUL. There may be errors in the original addresses that prevent us from geocoding firms to a particular house. Gazprom, for instance, is geocoded up to a house level in 2014 and 2021-2023, but only at street level for 2015-2020 due to improper handling of the house number by Nominatim. In that case we have fallen back to street-level geocoding. Additionally, streets in different districts of one city may share identical names. We have ignored those problems in our geocoding and invite your submissions. Finally, address of incorporation may not correspond with plant locations. For instance, Rosneft has 62 field offices in addition to the central office in Moscow. We ignore the location of such offices in our geocoding, but subsidiaries set up as separate legal entities are still geocoded.

Why is the data for firm X different from https://bo.nalog.ru/?

Many firms submit correcting statements after the initial filing. While we have downloaded the data way past the April, 2024 deadline for 2023 filings, firms may have kept submitting the correcting statements. We will capture them in the future releases.

Why is the data for firm X unrealistic?

We provide the source data as is, with minimal changes. Consider a relatively unknown LLC Banknota. It reported 3.7 trillion rubles in revenue in 2023, or 2% of Russia's GDP. This is obviously an outlier firm with unrealistic financials. We manually reviewed the data and flagged such firms for user consideration (variable outlier), keeping the source data intact.

Why is the data for groups of companies different from their IFRS statements?

We should stress that we provide unconsolidated financial statements filed according to the Russian accounting standards, meaning that it would be wrong to infer financials for corporate groups with this data. Gazprom, for instance, had over 800 affiliated entities and to study this corporate group in its entirety it is not enough to consider financials of the parent company.

Why is the data not in CSV?

The data is provided in Apache Parquet format. This is a structured, column-oriented, compressed binary format allowing for conditional subsetting of columns and rows. In other words, you can easily query financials of companies of interest, keeping only variables of interest in memory, greatly reducing data footprint.

Version and Update Policy

Version (SemVer): 1.0.0.

We intend to update the RFSD annualy as the data becomes available, in other words when most of the firms have their statements filed with the Federal Tax Service. The official deadline for filing of previous year statements is April, 1. However, every year a portion of firms either fails to meet the deadline or submits corrections afterwards. Filing continues up to the very end of the year but after the end of April this stream quickly thins out. Nevertheless, there is obviously a trade-off between minimization of data completeness and version availability. We find it a reasonable compromise to query new data in early June, since on average by the end of May 96.7% statements are already filed, including 86.4% of all the correcting filings. We plan to make a new version of RFSD available by July.

Licence

Creative Commons License Attribution 4.0 International (CC BY 4.0).

Copyright © the respective contributors.

Citation

Please cite as:

@unpublished{bondarkov2025rfsd, title={{R}ussian {F}inancial {S}tatements {D}atabase}, author={Bondarkov, Sergey and Ledenev, Victor and Skougarevskiy, Dmitriy}, note={arXiv preprint arXiv:2501.05841}, doi={https://doi.org/10.48550/arXiv.2501.05841}, year={2025}}

Acknowledgments and Contacts

Data collection and processing: Sergey Bondarkov, sbondarkov@eu.spb.ru, Viktor Ledenev, vledenev@eu.spb.ru

Project conception, data validation, and use cases: Dmitriy Skougarevskiy, Ph.D.,
One Classifier Ignores a Feature
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Maier; Karl Maier (2022). One Classifier Ignores a Feature [Dataset]. http://doi.org/10.5281/zenodo.6502643
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6502643
Dataset updated
Apr 29, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Karl Maier; Karl Maier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data sets are used in a controlled experiment, where two classifiers should be compared. train_a.csv and explain.csv are slices from the original data set. train_b.csv contains the same instances as in train_a.csv, but with feature x1 set to 0 to make it unusable to classifier B.

The original data set was created and split using this Python code:

from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression X, y = make_classification(n_samples=300, n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, class_sep=0.75, random_state=0) X *= 100 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0) lm = LogisticRegression() lm.fit(X_train, y_train) clf_a = lm clf_b = LogisticRegression() X2 = X.copy() X2[:, 0] = 0 X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y, test_size=0.5, random_state=0) clf_b.fit(X2_train, y2_train) X_explain = X_test y_explain = y_test
Data and simulation files for "Constraints on the intergalactic magnetic...
data.europa.eu
data.niaid.nih.gov
+1more
unknown
Updated May 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Data and simulation files for "Constraints on the intergalactic magnetic field using Fermi-LAT and H.E.S.S. blazar observations" [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8014311?locale=sv
Explore at:
unknown(878)Available download formats
Dataset updated
May 11, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this repository, we provide data files in connection to our paper “Constraints on the intergalactic magnetic field using Fermi-LAT and H.E.S.S. blazar observations” accepted for publication in the Astrophysical Journals and soon available on Arxiv. In the publication, we perform a joint analysis of observations of five blazars with the Fermi Large Area Telescope (LAT) and the High Energy Stereoscopic System (H.E.S.S.) in order to search for signatures of a gamma-ray halo around these sources. The non-detection of such extended emission allows us to place lower limits on the intergalactic magnetic field (IGMF). In this repository, we provide our data analysis products of both H.E.S.S. and LAT data for the case when a template for the halo flux is not included in the data. Furthermore, we provide files that contain the log likelihood profiles as functions of the IGMF in case the halo emission is included. Lastly, we also provide our template files for the halo, generated with CRPropa 3. Below, we provide minimal code examples to demonstrate how to read in the specific files. H.E.S.S. observational results We provide the best-fit spectral parameters as well as the flux points (spectral energy distribution; SED) for the H.E.S.S. observations of the five blazars under consideration. The corresponding files are: hess_fit_result_*.fits which contain the best-fit parameters, hess_sed_file_*.fits which contain the flux points. In the file names above, the '*' should be replaced with a the corresponding source name, e.g. 1ES0229+200. The files can be read in using astropy: from astropy.table import Table src = "1ES0229+200" best_fit_pars = Table.read("hess_fit_result_1ES0229+200.fits") sed = Table.read("hess_sed_file_1ES0229+200.fits") Fermi observational results For Fermi-LAT, we provide the SED files as well as the best-fit models for the region of interests. These files are called: fermi_avg_file_*.npy provides the best-fit ROI model fermi_sed_file_*.npy provides the SED. Both of these files are generated with fermipy and can be read-in the following way: import numpy as np # first a little helper function since the # fermipy analysis was run under python 2.7 def convert(data): if isinstance(data, bytes): return data.decode('ascii') if isinstance(data, dict): return dict(map(convert, data.items())) if isinstance(data, tuple): return map(convert, data) return data # Load the ROI fit roi_fit_file = "fermi_avg_file_1ES0229+200.npy" roi_fit = np.load(avg_file, allow_pickle=True, encoding="latin1").flat[0] # if you want to inspect the dictionaries in python 3, you need to run the convert function. # For example, to inspect the central source of the ROI # you would first get the source name src_fgl_name = roi_fit['config']['selection']['target'] # and then you can get the dictionary for the central source src_dict = convert(roi_fit['sources'])[src_fgl_name] # Load the SED sed_file = "fermi_sed_file_1ES0229+200.npy" sed = np.load(sed_file, allow_pickle=True, encoding='latin1').flat[0] # to plot the SED, you can use the SEDPlotter class from fermipy from fermipy.plotting import SEDPlotter SEDPlotter.plot_sed(sed) Likelihood profiles The likelihood profiles as function of the IGMF strengths are provided in the files logl_profile_*_*yr.npz. Their are provided for all five sources and all tested blazar activity times of 10, 104, and 107 years. They can be read in with the following code snippet: import numpy as np logl = dict(np.load("logl_profile_1ES0229+200_1.0e+07yr.npz")) b_fields = np.array([1.00000e-16, 3.16228e-16, 1.00000e-15, 3.16228e-15, 1.00000e-14, 3.16228e-14, 1.00000e-13]) for k, v in logl.items(): print(k,v) As the print command shows, the python dictionary contains 3 entries: "fermi_only" are the likelihood values for the Fermi data as a function of magnetic field, "combined" are the likelihood values from Fermi and H.E.S.S. combined, and "ps" is the likelihood value of the Fit without halo to the H.E.S.S. data only. Halo simulations Lastly, we also provide the output simulations files from CRPropa. For details how the simulations were run, please consult the accompanying paper, in particular Section 3.1 and Appendix C. For each source redshift, a tar file is provided, which in itself contains 7 hdf5 files with the simulation outputs for each tested magnetic field strength. The name of the files is casc_file_z*.tar.gz. After unpacking the files, they can be read in with your favorite hdf5 library; in python you would need to install h5py. We recommend that you check out this github repository which provides an advanced python wrapper for CRPropa and functions to read in the files. In particular, you can use this function to read in the files. It also writes a new hdf5 file with parallel transport applied. The written data is also returned together with the configuration dictionary. from simCRpropa.cascmaps import stack_results_lso data, config = stack_results_lso("casc_file_z0.140_B1.00e-16.hdf5", "casc_f
H
Hydroinformatics Instruction Module Example Code: Programmatic Data Access...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh (2022). Hydroinformatics Instruction Module Example Code: Programmatic Data Access with USGS Data Retrieval [Dataset]. https://www.hydroshare.org/resource/a58b5d522d7f4ab08c15cd05f3fd2ad3
Explore at:
zip(34.5 KB)Available download formats
Dataset updated
Mar 3, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity
Optiver Precomputed Features Numpy Array
kaggle.com
Updated Aug 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tal Perry (2021). Optiver Precomputed Features Numpy Array [Dataset]. https://www.kaggle.com/lighttag/optiver-precomputed-features-numpy-array
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tal Perry
Description
What's In This

This is a single numpy array with all the Optiver data joined together. It also has some of the features from this notebook It's designed to be mmapped so that you can read small pieces at once.

This is one big array with the trade and book data joined together plus some pre-computed features. The dtype of the array if fp16. The arrays shape is (n_times,n_stocks,600,27) where 600 is the max second_in_bucket and 27 is the number of columns.

How To Use It

Add the dataset to your notebook and then python import numpy as np ntimeids=3830 nstocks=112 ncolumns = 27 nseq = 600 arr = np.memmap('../input/optiver-precomputed-features-numpy-array/data.array',mode='r',dtype=np.float16,shape=(ntimeids,nstocks,600,ncolumns))

Caveats

Handling Varying Sequence Sizes

There are gaps in the stock ids and time ids, which doesn't work great with an array format. So we have time and stocks indexes as well (_ix suffix instead of _id). To calculate these:

import numpy as np import pandas as pd import numpy as np targets = pd.read_csv('/kaggle/input/optiver-realized-volatility-prediction/train.csv') ntimeids = targets.time_id.nunique() stock_ids = list(sorted(targets.stock_id.unique())) timeids = sorted(targets.time_id.unique()) timeid_to_ix = {time_id:i for i,time_id in enumerate(timeids)} stock_id_to_ix = {stock_id:i for i,stock_id in enumerate(stock_ids)}

Getting data For a particular stock id / time id

So to get the data for stock_id 13 on time_id 146 you'd do stock_ix = stock_id_to_ix[13] time_ix = timeid_to_ix[146] arr[time_ix,stock_ix]

Notice that the third dimension is of size 600 (the max number of points for a given time_ix,stock_id. Some of these will be empty. To get truncate a single stocks data do max_seq_ix = (arr[time_ix,stock_ix,:,-1]>0).cumsum().max() arr[time_ix,stock_ix,:max_seq_ix,]

Column Mappings

There are 27 columns in the last dimension these are:

['time_id', 'seconds_in_bucket', 'bid_price1', 'ask_price1', 'bid_price2', 'ask_price2', 'bid_size1', 'ask_size1', 'bid_size2', 'ask_size2', 'stock_id', 'wap1', 'wap2', 'log_return1', 'log_return2', 'wap_balance', 'price_spread', 'bid_spread', 'ask_spread', 'total_volume', 'volume_imbalance', 'price', 'size', 'order_count', 'stock_id_y', 'log_return_trade', 'target']
k
experiment_evaluation
radar.kit.edu
radar-service.eu
tar
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mervin Seiberlich (2023). experiment_evaluation [Dataset]. http://doi.org/10.35097/1426
Explore at:
tar(89088 bytes)Available download formats
Unique identifier
https://doi.org/10.35097/1426
Dataset updated
Jun 21, 2023
Dataset provided by
Karlsruhe Institute of Technology
Authors
Mervin Seiberlich
Description
🔬️ Experiment Evaluation

Python module for the evaluation of lab experiments. The module implements functions to import meta-data of measurements, filters to search for subsets of them and routines to import and plot data from this meta-data. It works well in its original context but is currently in open alpha since it will be restructured in order to be compatible with new lab environments. Examples of its usage in scientific works will soon be published by the author that can be used to reference it. Feel free to use it for your own projects and to ask questions. For now you can cite this repository as source.

💻️ Installation

You need a running python3 installation on your OS. The module was written on Debian/GNU-Linux, was tested on Windows and should also run on other OS. It is recommended to work in an virtual environment (see the official python documentation -> from bash: python3 -m venv exp_env source exp_env/bin/activate) or conda installation.

Dependencies

Dependencies are the usual scientific modules like numpy, matplotlib, pandas but also astropy. See the requirements.txt from that you should be able to install the library with pip install pip -U # Update pip itself pip install -r /path/to/requirements.txt Alternatively you can also install the required modules from the shell etc.. The author recommends to also install jupyter that includes the interactive ipython: ```

Example via pip

pip install jupyter pip install numpy pip install matplotlib pip install scipy pip install pandas pip install astropy pip install mplcursors pip install pynufft

pip install python-slugify # make sure this version of slugify is installed and not 'slugify'

## The module itself Inside your virtual environment there is a folder `exp_env/lib/python3.../site-packages`. Place the file `experiment_evaluation.py` inside this folder (or a new sub-folder with all your personal scientific code) to make it accessible. From within your code (try it from an interactive ipython session) you should now be able to import it via:

import experiment_evaluation as ee

or from subfolder: import my_scientific_modules.experiment_evaluation as ee

### Matplotlib style In order to use the fancy custom styles (for example for consistent looking graphs throughout your publication) it is advised to use matplotlib styles. For the provided styles, copy the custom styles "thesis_default.mplstyle" etc. from the folder `stylelib` inside your matplotlib library folder: `lib/python3.9/site-packages/matplotlib/mpl-data/stylelib/*.mplstyle` # 🧑‍💻 Usage A good way to learn its usage is to have a look at the [example](examples/example_experiment_evaluation.ipynb) file. But since the module is work in progress we first explain some concepts. ## ✨ Why meta-data? The module automates several steps of experiment evaluations. But the highlight is its capability to handle experimental meta-data. This enables the user to automatically choose and plot data with a question in mind (example: plot all EQE-curves at -2V and 173Hz) instead of repeatedly choosing files manually. For calculations that need more than one measurement this becomes extremely useful but also for implementing statistics. Meta data include things like experimental settings (applied voltage on a diode, time of the measurement, temperature etc.), the experimentalist and technical informations (file-format etc., manufacturer experimental device). The module includes some generic functions but to use it for your specific lab environment you might need to add experiment and plot specific functions. ## 💾️ How to save your experiment files? In general lab measurement files stem from different devices and export routines. So frankly speaking lab-data is often a mess! But to use automatic evaluation tools some sort of system to recognize the measurement-type and store the meta-data is needed. In an ideal world a lab would decide on one file format for all measurements and labels them systematically. To include different data-types and their meta-data within one file-type there exists the *.asdf (advanced scientific data format, see their [documentation](https://asdf.readthedocs.io/en/stable/index.html) for further insight). So if you are just starting with your PhD try to use this file format everywhere ;). Also to make experiments distinguishable every experiment needs an unique identifier. So you also should number every new experiment with an increasing number and the type of the experiment. Example of useful file naming for EQE measurements: `Nr783_EQE.asdf` In the case of my PhD I decided to use what I found: store the different file formats, store them in folders with the name of the experiment and include meta-data in the file-names (bad example: `EQE/Nr783_3volt_pix1.csv`). This was not the best idea (so learn from what I learned :P) To handle that mess, this module therefore implements also some regular-expressions to extract meta-data from file-names (`ee.meta_from_filename()`), but in general it is advised to store all meta-data in the file-header (with the exception of the unique identifier and experiment type). Like this you could store your files in whatever folder structure you like and still find them from within the script. The module then imports meta-data from the files into a database and you can do fancy data-science with your data! ## 📑️ Database For calculations and filtering of datasets the meta-data and data needs to be accessible in a machine readable form. For the time being the module imports all meta-data into a pandas DataFrame that represents our database (For very large datasets this would possibly be needed to be changed). For this we have to name the root folder that includes all experiment files/folders. **Hint**: If you did not follow the unique labeling/numbering for all your experiments you can still use this module by choosing a root folder that only includes the current experiment.

from pathlib import Path measurement_root_folder = Path("/home/PhD/Data/") We can specify some pre-filtering for the specific experiment we want to evaluate:

make use of the '/' operator to build OS independant paths

measurement_folder = measurement_root_folder / "LaserLab" / "proximity-sensor" / "OPD-Lens" / "OPD-Lens_v2"

Define some pre-filter

devices = [nr for nr in range(1035, 1043)] # Unique sample numbers of the experiment listed by list-comprehension explst = "Mervin Seiberlich" Then we import the metadata into the pandas DataFrame database via `ee.list_measurements()` and call it *meta-table*: meta_table = ee.list_measurements(measurement_root_folder, devices, experimentalist=explst, sort_by=["measurement_type", "nr", "pix", "v"]) ```

💡️ Advanced note:

Internally ee.list_measurements() uses custom functions to import the experiment specific meta-data. Have a look into the source-code and search for read_meta for an example how this works in detail. With the *.asdf file-format only the generalized import function would be needed.

Import data and meta-data

To import now some measurement data for plotting we use the information inside meta_table with custom import routines and python dictionaries implementing our filters: ```

Distinguish between reference and other measurments

lens = {"nr":devices[:5]} ref = {"nr":devices[5:]}

Select by bias and compare reference samples with lens (**dict unpacks the values to combine two or mor dictionaries)

eqe_lens_0V = ee.import_eqe(meta_table, mask_dict={**lens, {"v":0}}) eqe_ref_0V = ee.import_eqe(meta_table, mask_dict={ref, **{"v":0}}) ``This yields python listseqe_lens_0V = [table1, table2, ... tableN]` with the selected data ready for plotting (Lists are maybe not smart for huge dataset and some N-dimensional object can replace this in future). Note: The tables inside the list are astropy.QTable() objects including the data and meta-data, as well as units! So with this few lines of code you already did some advanced data filtering and import!

🌡️ Physical units

The module astropy includes a submodule astropy.units. Since we deal with real world data, it is a good idea to also include units in calculations. ``` import astropy.units as u

Radius of one microlens:

r = 98 * u.um ```

📝️ Calculations

If you have to repeatedly do some advanced calculations or fits for some plots, include them as functions in the source-code. An example would be ee.pink_noise()

📊️ Plots

For plotting there exists many modules in python. Due to its grate power we use matplotlib. This comes with the cost of some complexity (definitely have a look at its documentation!). But this enables us for example to have a consistence color style, figure-size and text-size in large projects like a PhD-thesis: mpl.style.use(["thesis_default", "thesis_talk"]) # We use style-sheets to set things like figure-size and text-size, see https://matplotlib.org/stable/tutorials/introductory/customizing.html#composing-styles w,h = plt.rcParams['figure.figsize'] # get the default size for figures to scale plots accordingly In order to not invent the wheel over and over again it makes sense to wrap some plotting routines for each experiment inside some custom functions. For further detail see the documentation/recommended function signature for matplotlib specialized functions. This enables easy experiment-type specific plotting (even with statistics) once all functions are set up: ```

%% plot eqe statistics

fig, ax = plt.subplots(1,1, figsize=(w, h), layout="constrained") ee.plot_eqe(ax, eqe_lens_0V, statistics=True, color="tab:green", plot_type="EQE", marker=True, ncol=2) ee.plot_eqe(ax, eqe_ref_0V, statistics=True, color="tab:blue", plot_type="EQE", marker=True,

Facebook

Twitter

Click to copy link

Link copied

Cite

Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:

23 scholarly articles cite this dataset (View in Google Scholar)

.bin, .xml, .csv, .xlsAvailable download formats

Dataset provided by

Seair Info Solutions PVT

Authors

Seair Exim

Area covered

India

Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Clear search

Close search

Google apps

Main menu

Python Import Data India – Buyers & Importers List

Python Import Data in December - Seair.co.in

Storage and Transit Time Data and Code

Open Context Database SQL Dump

Eximpedia Export Import Trade

Python Import Data in March - Seair.co.in

Ballroom Python South | See Full Import/Export Data | Eximpedia

Smartwatch Purchase Data

Python Import Data in September - Seair.co.in

Customers order for a Printing Company (2D Bin Packing and Scheduling)

Python package Datatable

Context

Content

Installation

Using

Acknowledgements

Documentation

License

Stage One Experiment - Datasets

Python-DPO

Python Import Data in August - Seair.co.in

Data from: Russian Financial Statements Database: A firm-level collection of...

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

Read RFSD metadata from local file

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

Load full dataset into memory

Load only 2019 data into memory

Load only revenue for firms in 2019, identified by taxpayer id

Give suggested descriptive names to variables

Read RFSD metadata from local file

Use schema() to glimpse into the data structure and column classes

Load full dataset into memory

Load only 2019 data into memory

Load only revenue for firms in 2019, identified by taxpayer id

Give suggested descriptive names to variables

One Classifier Ignores a Feature

Data and simulation files for "Constraints on the intergalactic magnetic...

Hydroinformatics Instruction Module Example Code: Programmatic Data Access...

Optiver Precomputed Features Numpy Array

What's In This

How To Use It

Caveats

Handling Varying Sequence Sizes

Getting data For a particular stock id / time id

Column Mappings

experiment_evaluation

🔬️ Experiment Evaluation

💻️ Installation

Dependencies

Example via pip

pip install python-slugify # make sure this version of slugify is installed and not 'slugify'

or from subfolder: import my_scientific_modules.experiment_evaluation as ee

make use of the '/' operator to build OS independant paths

Define some pre-filter

💡️ Advanced note:

Import data and meta-data

Distinguish between reference and other measurments

Select by bias and compare reference samples with lens (**dict unpacks the values to combine two or mor dictionaries)

🌡️ Physical units

Radius of one microlens:

📝️ Calculations

📊️ Plots

%% plot eqe statistics

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD