Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterPython International Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
Twitterminimizes errors
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open Context (https://opencontext.org) publishes free and open access research data for archaeology and related disciplines. An open source (but bespoke) Django (Python) application supports these data publishing services. The software repository is here: https://github.com/ekansa/open-context-py
The Open Context team runs ETL (extract, transform, load) workflows to import data contributed by researchers from various source relational databases and spreadsheets. Open Context uses PostgreSQL (https://www.postgresql.org) relational database to manage these imported data in a graph style schema. The Open Context Python application interacts with the PostgreSQL database via the Django Object-Relational-Model (ORM).
This database dump includes all published structured data organized used by Open Context (table names that start with 'oc_all_'). The binary media files referenced by these structured data records are stored elsewhere. Binary media files for some projects, still in preparation, are not yet archived with long term digital repositories.
These data comprehensively reflect the structured data currently published and publicly available on Open Context. Other data (such as user and group information) used to run the Website are not included.
IMPORTANT
This database dump contains data from roughly 190+ different projects. Each project dataset has its own metadata and citation expectations. If you use these data, you must cite each data contributor appropriately, not just this Zenodo archived database dump.
Facebook
TwitterAntonin Python Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. FeltonDate: 5/5/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably in this project.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a particular function:
01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.
02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.
03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.
04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.
05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.
06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides the raw data associated with the NCBI GEO accession number GSE183947. The underlying data is RNA-Sequencing (RNA-Seq) expression matrix. It is derived from matched normal and malignant breast cancer tissue samples. The primary goal of this resource is to teach the complete workflow of: - Downloading and importing high-throughput genomics data from public repositories. - Cleaning and normalizing the raw expression values (e.g., FPKM/TPM). - Preparing the data structure for downstream Differential Gene Expression (DEG) analysis. This resource is essential for anyone practicing translational bioinformatics and cancer research.
Facebook
TwitterBallroom Python South Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterThis project developed a comprehensive data management system designed to support collaborative groundwater research across institutions by establishing a centralized, structured database for hydrologic time series data. Built on the Observations Data Model (ODM), the system stores time series data and metadata in a relational SQLite database. Key project components included database construction, automation of data formatting and importation, development of analytical and visualization tools, and integration with ArcGIS for geospatial representation. The data import workflow standardizes and validates diverse .csv datasets by aligning them with ODM formatting. A Python-based module was created to facilitate data retrieval, analysis, visualization, and export, while an interactive map feature enables users to explore site-specific data availability. Additionally, a custom ArcGIS script was implemented to generate maps that incorporate stream networks, site locations, and watershed boundaries using DEMs from USGS sources. The system was tested using real-world datasets from groundwater wells and surface water gages across Utah, demonstrating its flexibility in handling diverse formats and parameters. The relational structure enabled efficient querying and visualization, and the developed tools promoted accessibility and alignment with FAIR principles.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Russian Financial Statements Database (RFSD) is an open, harmonized collection of annual unconsolidated financial statements of the universe of Russian firms:
🔓 First open data set with information on every active firm in Russia.
🗂️ First open financial statements data set that includes non-filing firms.
🏛️ Sourced from two official data providers: the Rosstat and the Federal Tax Service.
📅 Covers 2011-2023 initially, will be continuously updated.
🏗️ Restores as much data as possible through non-invasive data imputation, statement articulation, and harmonization.
The RFSD is hosted on 🤗 Hugging Face and Zenodo and is stored in a structured, column-oriented, compressed binary format Apache Parquet with yearly partitioning scheme, enabling end-users to query only variables of interest at scale.
The accompanying paper provides internal and external validation of the data: http://arxiv.org/abs/2501.05841.
Here we present the instructions for importing the data in R or Python environment. Please consult with the project repository for more information: http://github.com/irlcode/RFSD.
Importing The Data
You have two options to ingest the data: download the .parquet files manually from Hugging Face or Zenodo or rely on 🤗 Hugging Face Datasets library.
Python
🤗 Hugging Face Datasets
It is as easy as:
from datasets import load_dataset import polars as pl
RFSD = load_dataset('irlspbru/RFSD')
RFSD_2023 = pl.read_parquet('hf://datasets/irlspbru/RFSD/RFSD/year=2023/*.parquet')
Please note that the data is not shuffled within year, meaning that streaming first n rows will not yield a random sample.
Local File Import
Importing in Python requires pyarrow package installed.
import pyarrow.dataset as ds import polars as pl
RFSD = ds.dataset("local/path/to/RFSD")
print(RFSD.schema)
RFSD_full = pl.from_arrow(RFSD.to_table())
RFSD_2019 = pl.from_arrow(RFSD.to_table(filter=ds.field('year') == 2019))
RFSD_2019_revenue = pl.from_arrow( RFSD.to_table( filter=ds.field('year') == 2019, columns=['inn', 'line_2110'] ) )
renaming_df = pl.read_csv('local/path/to/descriptive_names_dict.csv') RFSD_full = RFSD_full.rename({item[0]: item[1] for item in zip(renaming_df['original'], renaming_df['descriptive'])})
R
Local File Import
Importing in R requires arrow package installed.
library(arrow) library(data.table)
RFSD <- open_dataset("local/path/to/RFSD")
schema(RFSD)
scanner <- Scanner$create(RFSD) RFSD_full <- as.data.table(scanner$ToTable())
scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scanner <- scan_builder$Finish() RFSD_2019 <- as.data.table(scanner$ToTable())
scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scan_builder$Project(cols = c("inn", "line_2110")) scanner <- scan_builder$Finish() RFSD_2019_revenue <- as.data.table(scanner$ToTable())
renaming_dt <- fread("local/path/to/descriptive_names_dict.csv") setnames(RFSD_full, old = renaming_dt$original, new = renaming_dt$descriptive)
Use Cases
🌍 For macroeconomists: Replication of a Bank of Russia study of the cost channel of monetary policy in Russia by Mogiliat et al. (2024) — interest_payments.md
🏭 For IO: Replication of the total factor productivity estimation by Kaukin and Zhemkova (2023) — tfp.md
🗺️ For economic geographers: A novel model-less house-level GDP spatialization that capitalizes on geocoding of firm addresses — spatialization.md
FAQ
Why should I use this data instead of Interfax's SPARK, Moody's Ruslana, or Kontur's Focus?hat is the data period?
To the best of our knowledge, the RFSD is the only open data set with up-to-date financial statements of Russian companies published under a permissive licence. Apart from being free-to-use, the RFSD benefits from data harmonization and error detection procedures unavailable in commercial sources. Finally, the data can be easily ingested in any statistical package with minimal effort.
What is the data period?
We provide financials for Russian firms in 2011-2023. We will add the data for 2024 by July, 2025 (see Version and Update Policy below).
Why are there no data for firm X in year Y?
Although the RFSD strives to be an all-encompassing database of financial statements, end users will encounter data gaps:
We do not include financials for firms that we considered ineligible to submit financial statements to the Rosstat/Federal Tax Service by law: financial, religious, or state organizations (state-owned commercial firms are still in the data).
Eligible firms may enjoy the right not to disclose under certain conditions. For instance, Gazprom did not file in 2022 and we had to impute its 2022 data from 2023 filings. Sibur filed only in 2023, Novatek — in 2020 and 2021. Commercial data providers such as Interfax's SPARK enjoy dedicated access to the Federal Tax Service data and therefore are able source this information elsewhere.
Firm may have submitted its annual statement but, according to the Uniform State Register of Legal Entities (EGRUL), it was not active in this year. We remove those filings.
Why is the geolocation of firm X incorrect?
We use Nominatim to geocode structured addresses of incorporation of legal entities from the EGRUL. There may be errors in the original addresses that prevent us from geocoding firms to a particular house. Gazprom, for instance, is geocoded up to a house level in 2014 and 2021-2023, but only at street level for 2015-2020 due to improper handling of the house number by Nominatim. In that case we have fallen back to street-level geocoding. Additionally, streets in different districts of one city may share identical names. We have ignored those problems in our geocoding and invite your submissions. Finally, address of incorporation may not correspond with plant locations. For instance, Rosneft has 62 field offices in addition to the central office in Moscow. We ignore the location of such offices in our geocoding, but subsidiaries set up as separate legal entities are still geocoded.
Why is the data for firm X different from https://bo.nalog.ru/?
Many firms submit correcting statements after the initial filing. While we have downloaded the data way past the April, 2024 deadline for 2023 filings, firms may have kept submitting the correcting statements. We will capture them in the future releases.
Why is the data for firm X unrealistic?
We provide the source data as is, with minimal changes. Consider a relatively unknown LLC Banknota. It reported 3.7 trillion rubles in revenue in 2023, or 2% of Russia's GDP. This is obviously an outlier firm with unrealistic financials. We manually reviewed the data and flagged such firms for user consideration (variable outlier), keeping the source data intact.
Why is the data for groups of companies different from their IFRS statements?
We should stress that we provide unconsolidated financial statements filed according to the Russian accounting standards, meaning that it would be wrong to infer financials for corporate groups with this data. Gazprom, for instance, had over 800 affiliated entities and to study this corporate group in its entirety it is not enough to consider financials of the parent company.
Why is the data not in CSV?
The data is provided in Apache Parquet format. This is a structured, column-oriented, compressed binary format allowing for conditional subsetting of columns and rows. In other words, you can easily query financials of companies of interest, keeping only variables of interest in memory, greatly reducing data footprint.
Version and Update Policy
Version (SemVer): 1.0.0.
We intend to update the RFSD annualy as the data becomes available, in other words when most of the firms have their statements filed with the Federal Tax Service. The official deadline for filing of previous year statements is April, 1. However, every year a portion of firms either fails to meet the deadline or submits corrections afterwards. Filing continues up to the very end of the year but after the end of April this stream quickly thins out. Nevertheless, there is obviously a trade-off between minimization of data completeness and version availability. We find it a reasonable compromise to query new data in early June, since on average by the end of May 96.7% statements are already filed, including 86.4% of all the correcting filings. We plan to make a new version of RFSD available by July.
Licence
Creative Commons License Attribution 4.0 International (CC BY 4.0).
Copyright © the respective contributors.
Citation
Please cite as:
@unpublished{bondarkov2025rfsd, title={{R}ussian {F}inancial {S}tatements {D}atabase}, author={Bondarkov, Sergey and Ledenev, Victor and Skougarevskiy, Dmitriy}, note={arXiv preprint arXiv:2501.05841}, doi={https://doi.org/10.48550/arXiv.2501.05841}, year={2025}}
Acknowledgments and Contacts
Data collection and processing: Sergey Bondarkov, sbondarkov@eu.spb.ru, Viktor Ledenev, vledenev@eu.spb.ru
Project conception, data validation, and use cases: Dmitriy Skougarevskiy, Ph.D.,
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Data pulled from Traffy Fondue, by accessing the Traffy Fondue Open API. Date January 2022 until January 2025
The following code pulled the data:
import os
import json
import requests
from datetime import datetime, timedelta
import time
class TraffyDataFetcher:
def _init_(self, start_date, subfolder='traffyfonduedata'):
self.url = "https://publicapi.traffy.in.th/share/teamchadchart/search"
self.query = {'offset': '0'}
self.payload = {}
self.headers = {}
self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
self.end_date = datetime.now()
self.subfolder = subfolder
self.max_requests_per_minute = 99
if not os.path.exists(self.subfolder):
os.makedirs(self.subfolder)
def add_days_to_date(self, start_date_str, days_to_add):
start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
new_date = start_date + timedelta(days=days_to_add)
return new_date.strftime('%Y-%m-%d')
def fetch_data(self):
current_date = self.start_date
index = 0
while current_date <= self.end_date:
start_time = datetime.now()
self.query['start'] = current_date.strftime('%Y-%m-%d')
new_date = self.add_days_to_date(self.query['start'], 10)
self.query['end'] = new_date
response = requests.request("GET", self.url, headers=self.headers, data=self.payload, params=self.query)
print(f"offset: {index} response: {response.status_code}")
filename = f"traffy_{current_date.strftime('%Y-%m-%d')}.json"
file_path = os.path.join(self.subfolder, filename)
with open(file_path, "w") as outfile:
json_object = json.dumps(response.json(), indent=4)
outfile.write(json_object)
end_time = datetime.now()
elapsed_time = (end_time - start_time).total_seconds()
print(f"Elapsed time: {elapsed_time} s")
index += 950
current_date = datetime.strptime(new_date, '%Y-%m-%d') + timedelta(days=1)
if index % self.max_requests_per_minute == 0:
time.sleep(60 - elapsed_time)
if _name_ == "_main_":
fetcher = TraffyDataFetcher(start_date='2022-01-01')
fetcher.fetch_data()
--
And the following code converted the json to CSV files
import os
import glob
import json
import pandas as pd
#import numpy as np
class TraffyJSONFixer:
def _init_(self, path_to_json='*.json', subfolder='traffyfonduedata'):
self.path_to_json = path_to_json
self.subfolder = subfolder
self.outputfolder = 'fixedjson'
self.excelfolder = 'exceloutput'
self.file_path = os.path.join(self.subfolder, self.path_to_json)
self.json_files = glob.glob(self.file_path)
# Ensure the subfolder exists
if not os.path.exists(self.subfolder):
os.makedirs(self.subfolder)
# Ensure the outputfolder exists
if not os.path.exists(self.outputfolder):
os.makedirs(self.outputfolder)
# Ensure the excelfolder exists
if not os.path.exists(self.excelfolder):
os.makedirs(self.excelfolder)
# Debugging: Print the current working directory and the list of JSON files
print(f"Current working directory: {os.getcwd()}")
print(f"Found JSON files: {self.json_files}")
def fix_json_files(self):
for count, ele in enumerate(self.json_files):
new_file_name = os.path.join(self.outputfolder, f"data_{os.path.basename(ele)}")
try:
with open(ele, 'r', encoding='utf-8') as f:
data = json.load(f)
# Debugging: Print the type of data
print(f"Processing file: {ele}")
print(f"Type of data: {type(data)}")
# Handle different JSON structures
if isinstance(data, dict) and "results" in data:
results = data["results"]
elif isinstance(data, list):
results = data
else:
print(f"Unexpected JSON structure in file: {ele}")
continue
# Ensure results is a list or dict before writing
if isinstance(results, (list, dict)):
with open(new_file_name, 'w', encoding='utf-8') as f:
f.write(json.dumps(results, indent=4))
else:
print(f"Unexpected type for results in file: {ele}")
except (json.JSONDecodeError, KeyError) as e:
print(f"Error processing file {ele}: {e}")
def jsontoexcel(self):
jsonfile_path = os.path.join(self.out...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This study has exploited the daily weather records of Seungjeongwon Ilgi from the NIKH database. Seungjeongwon Ilgi (http://sjw.history.go.kr/main.do) is a daily record of the Seungjeongwon, the Royal Secretariat of the Joseon Dynasty of Korea. These diaries span from 1623 to 1910 and generally involve daily weather records in the entry header. Their observational site would be located in Seoul (N37°35′, E126°59′). We have exploited the weather records from the NIKH database and classified the daily weather using text mining method. We have also converted the report dates from the traditional lunisolar calendar to the Gregorian calendar, to better contextualise our data into the contemporary daily measurements.
Data
We provide different formats (csv, xlsx, json) to facilitate the usage of data. The main contents of data are listed as below.
Import Data
# Python
# CSV file
import pandas as pd
data=pd.read_csv('~/SJWilgi_Seoul_Weather_YR1623_1910.csv',encoding="utf-8")
# JSON file
data=pd.read_json('~/SJWilgi_Seoul_Weather_YR1623_1910.json',encoding="utf-8")
# Excel file
data=pd.read_excel('~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx') # Excel file
# R
# CSV file
library(readr)
data<- read_csv("~/SJWilgi_Seoul_Weather_YR1623_1910.csv")
# Excel file
library(readxl)
data <- read_excel("~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx")
Facebook
TwitterSSURGO PortalThe newest version of SSURGO Portal with Soil Data Viewer is available via the Quick Start Guide. Install Python to C:\Program Files. This is a different version than what ArcGIS Pro uses.If you need data for multiple states, we also offer a prebuilt large database with all SSURGO for the entire United States and all Islands. The prebuilt saves you time but it’s large and takes a while to download.You can also use the prebuilt gNATSGO GeoPackage database in SSURGO Portal – Soil Data Viewer. Read the ReadMe.txt in the folder. More about gNATSGO here. You can also import STATSGO2 data into SSURGO Portal and create a database to use in Soil Data Viewer – Available for download via the Soils Box folder. SSURGO Portal NotesThis 10 minute video covers it all, other than installation of SSURGO Portal and the GIS tool. Installation is typically smooth and easy.There is also a user guide on the SSURGO Portal website that can be very helpful. It has info about using the data in ArcGIS Pro or QGIS. SQLite SSURGO database be opened and queried with DB Browser. It’s essentially free Microsoft Access.Guidance about setting up DB Browser to easily open SQLite databases is available in section 4 of this Installation Guide.Workflow if you need to make your own databaseInstall SSURGO PortalInstall SSURGO Downloader GIS tool (Refer to the Installation and User Guide for assistance)There is one for QGIS and one for ArcGIS Pro. They both do the same thing. Quickly download California SSURGO data with toolEnter two digit state symbol followed by asterisk in “Search by Areasymbol” to download all data for state.For example, enter CA* to batch download all data for CaliforniaOpen SSURGO Portal and create a new SQLite SSURGO Template database (Refer to the User Guide for assistance)Import SSURGO data you downloaded into databaseYou can import SSURGO data from many states at once, building a database that spans many statesAfter SSURGO data is done importing, click on Soil Data Viewer tab and run ratingsThese are the exact same ratings as Web Soil SurveyA new table is added to your database for each ratingYou can search for ratings by keywordIf desired, open database in GIS and make a map (Refer to the User Guide for assistance)Workflow if you need use large prebuilt database (don’t make own database) Install SSURGO PortalIn SSURGO Portal, browse to unzipped prebuilt GeoPackage database with all SSURGOprebuilt large database with all SSURGOgNATSGO GeoPackage databaseIn SSURGO Portal, click on Soil Data Viewer tab and run ratingsThese are the exact same ratings as Web Soil SurveyA new table is added to your database for each ratingYou can search for ratings by keywordIf desired, open database in GIS and make a mapIf you have trouble installing SSURGO Portal. Its usually the connection with Python. Create Desktop short cut that tells SSURGO Portal which Python to useThese were created for Windows 11 Right click anywhere on your desktop and choose New > ShortcutIn the text bar enter your path to the python.exe and your path to the SSURGO Portal.pyz. Notes:Example of format:"C:\Program Files\Python310\python.exe" "C:\SSURGO Portal\SSURGO_Portal-0.3.0.8.pyz"Include quotation marks.Paths may be different on your machine. To avoid typing, you can browse to python.exe in windows explorer, right click and select "Copy as Path and paste results into box. Paste into short location and then do the same for SSURGO Portal.pyz file, but paste to the right of the python.exe path. Click NextName the shortcut anything you want.
Facebook
TwitterView details of Acerola Extract import data and shipment reports in US with product description, price, date, quantity, major us ports, countries and US buyers/importers list, overseas suppliers/exporters list.
Facebook
TwitterDataset Card for Python-DPO
This dataset is the larger version of Python-DPO dataset and has been created using Argilla.
Load with datasets
To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset
ds = load_dataset("NextWealth/Python-DPO")
Data Fields
Each data instance contains:
instruction: The problem description/requirements chosen_code:… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO-Large.
Facebook
TwitterThis dataset helps you to increase the data-cleaning process using the pure python pandas library.
Facebook
TwitterPython Llc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.