Csv Exports And Imports Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.
The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.
Key Features:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data set is a subset of the "Records of foreign capital" (Registros de capitais estrangeiros", RCE) published by the Central Bank of Brazil (CBB) on their website.The data set consists of three data files and three corresponding metadata files. All files are in openly accessible .csv or .txt formats. See detailed outline below for data contained in each. Data files contain transaction-specific data such as unique identifier, currency, cancelled status and amount. Metadata files outline variables in the corresponding data file.RCE_Unclean_full_dataset.csv - all transactions published to the Central Bank website from the four main categories outlined belowMetadata_Unclean_full_dataset.csvRCE_Unclean_cancelled_dataset.csv - data extracted from the RCE_Unclean_full_dataset.csv where transactions were registered then cancelledMetadata_Unclean_cancelled_dataset.csvRCE_Clean_selection_dataset.csv - transaction data extracted from RCE_Unclean_full_dataset.csv and RCE_Unclean_cancelled_dataset.csv for the nine companies and criteria identified belowMetadata_Clean_selection_dataset.csvThe data include the period between October 2000 and July 2011. This is the only time span for the data provided by the Central Bank of Brazil at this stage. The records were published monthly by the Central Bank of Brazil as required by Art. 66 in Decree nº 55.762 of 17 February 1965, modified by Decree nº 4.842 of 17 September 2003. The records were published on the bank’s website starting October 2000, as per communique nº 011489 of 7 October 2003. This remained the case until August 2011, after which the amount of each transaction was no longer disclosed (and publication of these stopped altogether after October 2011). The disclosure of the records was suspended in order to review their legal and technical aspects, and ensure their suitability to the requirements of the rules governing the confidentiality of the information (Law nº 12.527 of 18 November 2011 and Decree nº 7724 of May 2012) (pers. comm. Central Bank of Brazil, 2016. Name of contact available upon request to Authors).The records track transfers of foreign capital made from abroad to companies domiciled in Brazil, with information on the foreign company (name and country) transferring the money, and on the company receiving the capital (name and federative unit). For the purpose of this study, we consider the four categories of foreign capital transactions which are published with their amount and currency in the Central Bank’s data, and which are all part of the “Register of financial transactions” (abbreviated RDE-ROF): loans, leasing, financed import and cash in advance (see below for a detailed description). Additional categories exist, such as foreign direct investment (RDE-IED) and External Investment in Portfolio (RDE-Portfólio), for which no amount is published and which are therefore not included.We used the data posted online as PDFs on the bank’s website, and created a script to extract the data automatically from these four categories into the RCE_Unclean_full_dataset.csv file. This data set has not been double-checked manually and may contain errors. We used a similar script to extract rows from the "cancelled transactions" sections of the PDFs into the RCE_Unclean_cancelled_dataset.csv file. This is useful to identify transactions that have been registered to the Central Bank but later cancelled. This data set has not been double-checked manually and may contain errors.From these raw data sets, we conducted the following selections and calculations in order to create the RCE_Clean_selection_dataset.csv file. This data set has been double-checked manually to secure that no errors have been made in the extraction process.We selected all transactions whose recipient company name corresponds to one of these nine companies, or to one of their known subsidiaries in Brazil, according to the list of subsidiaries recorded in the Orbis database, maintained by Bureau Van Dijk. Transactions are included if the recipient company name matches one of the following:- the current or former name of one of the nine companies in our sample (former names are identified using Orbis, Bloomberg’s company profiles or the company website);- the name of a known subsidiary of one of the nine companies, if and only if we find evidence (in Orbis, Bloomberg’s company profiles or on the company website) that this subsidiary was owned at some point during the period 2000-2011, and that it operated in a sector related to the soy or beef industry (including fertilizers and trading activities).For each transaction, we extracted the name of the company sending capital and when possible, attributed the transaction to the known ultimate owner.The name of the countries of origin sometimes comes with typos or different denominations: we harmonized them.A manual check of all the selected data unveiled that a few transactions (n=14), appear twice in the database while bearing the same unique identification number. According to the Central Bank of Brazil (pers. comm., November 2016), this is due to errors in their routine of data extraction. We therefore deleted duplicates in our database, keeping only the latest occurrence of each unique transaction. Six (6) transactions recorded with an amount of zero were also deleted. Two (2) transactions registered in August 2003 with incoherent currencies (Deutsche Mark and Dutch guilder, which were demonetised in early 2002) were also deleted.To secure that the import of data from PDF to the database did not contain any systematic errors, for instance due to mistakes in coding, data were checked in two ways. First, because the script identifies the end of the row in the PDF using the amount of the transaction, which can sometimes fail if the amount is not entered correctly, we went through the extracted raw data (2798 rows) and cleaned all rows whose end had not been correctly identified by the script. Next, we manually double-checked the 486 largest transactions representing 90% of the total amount of capital inflows, as well as 140 randomly selected additional rows representing 5% of the total rows, compared the extracted data to the original PDFs, and found no mistakes.Transfers recorded in the database have been made in different currencies, including US dollars, Euros, Japanese Yens, Brazilian Reais, and more. The conversion to US dollars of all amounts denominated in other currencies was done using the average monthly exchange rate as published by the International Monetary Fund (International Financial Statistics: Exchange rates, national currency per US dollar, period average). Due to the limited time period, we have not corrected for inflation but aggregated nominal amounts in USD over the period 2000-2011.The categories loans, cash in advance (anticipated payment for exports), financed import, and leasing/rental, are those used by the Central Bank of Brazil in their published data. They are denominated respectively: “Loans” (“emprestimos” in original source) - : includes all loans, either contracted directly with creditors or indirectly through the issuance of securities, brokered by foreign agents. “Anticipated payment for exports” (“pagamento/renovacao pagamento antecipado de exportacao” in original source): defined as a type of loan (used in trade finance)“Financed import” (“importacao financiada” in original source): comprises all import financing transactions either direct (contracted by the importer with a foreign bank or with a foreign supplier), or indirect (contracted by Brazilian banks with foreign banks on behalf of Brazilian importers). They must be declared to the Central Bank if their term of payment is superior to 360 days.“Leasing/rental” (“arrendamento mercantil, leasing e aluguel” in original source) : concerns all types of external leasing operations consented by a Brazilian entity to a foreign one. They must be declared if the term of payment is superior to 360 days.More information about the different categories can be found through the Central Bank online.(Research Data Support provided by Springer Nature)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.
If you use this data for a scientific publication, please consider citing our paper.
The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:
go_arounds_minimal.csv.gz
Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.
go_arounds_augmented.csv.gz
Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
registration
string
Aircraft registration
typecode
string
Aircraft ICAO typecode
icaoaircrafttype
string
ICAO aircraft type
wtc
string
ICAO wake turbulence category
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometre
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country
string
ISO Alpha-3 country code of the operator
operator_region
string
Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts
integer
METAR, surface wind speed in knots
wind_dir_deg
integer
METAR, surface wind direction in degrees
wind_gust_knts
integer
METAR, surface wind gust speed in knots
visibility_m
float
METAR, visibility in m
temperature_deg
integer
METAR, temperature in degrees Celsius
press_sea_level_p
float
METAR, sea level pressure in hPa
press_p
float
METAR, QNH in hPA
weather_intensity
list
METAR, list of present weather codes: qualifier - intensity
weather_precipitation
list
METAR, list of present weather codes: weather phenomena - precipitation
weather_desc
list
METAR, list of present weather codes: qualifier - descriptor
weather_obscuration
list
METAR, list of present weather codes: weather phenomena - obscuration
weather_other
list
METAR, list of present weather codes: weather phenomena - other
This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.
go_arounds_agg.csv.gz
Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:
Column name
Type
Description
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
n_landings
integer
Total number of landings observed on this runway in 2019
ga_rate
float
Go-around rate, per 1000 landings
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometres
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
This aggregated data set is used in the paper for the generalized linear regression model.
Downloading the trajectories
Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:
import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])
airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )
df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")
flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time
# fetch the data from OpenSky Network
flights.append(
opensky.history(
start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
callsign=row["callsign"],
return_flight=True,
)
)
Traffic.from_flights(flights)
Additional files
Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:
validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.
validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
https://data.gov.tw/licensehttps://data.gov.tw/license
Provide "Statistics of Import and Export Trade Volume of Each Park" to let the public understand the import and export and its growth trend of each park. In addition to updating this information every month, CSV file format is also provided for free download and use by the public.The dataset includes statistics on the import and export trade volume of parks such as Nanzih, Kaohsiung, Taichung, Zhonggang, Pingtung, and other parks (Lingguang, Chenggong, Gaoruan), with main fields including "Park, Import and Export (This Month, Year-to-Date)", "Export (This Month, Year-to-Date)", "Import (This Month, Year-to-Date)", and other important information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials
Background
This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.
The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).
Usage
Included Files
File Format: Downsampled Data
These are the "LP_
These data files can be easily loaded using the pandas library in Python through:
import pandas
data = pandas.read_csv(data_file, index_col=0)
The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.
File Format: Unreduced Data
These are the "LP_
The data can be loaded and used similarly to the downsampled data.
File Format: Overall_Summary
The overall summary file provides data on all the test specimens in the database. The columns include:
File Format: Summarized_Mechanical_Props_Campaign
Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,
tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
keep_default_na=False, na_values='')
Caveats
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.
Csv Active S R L Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a local CSV file of WormJam (https://www.tandfonline.com/doi/full/10.1080/21624054.2017.1373939) for MetFrag (https://msbi.ipb-halle.de/MetFrag/).
The text file provided by Michael (also part of this dataset) was modified into CSV by adding identifiers and adjusting headers for MetFrag import.
This CSV file is for users wanting to integrate WormJam into MetFrag CL workflows (offline), this file will be integrated into MetFrag online; please use the file in the dropdown menu rather than uploading this one.
Update 10 Sept 2019: curated truncated InChIKey, InChI entries, added missing SMILES, added DTXSIDs by InChIKey match.
This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files
The trak extension for CKAN enhances the platform's tracking capabilities by providing tools to import Google Analytics data and modify the presentation of page view statistics. It introduces a paster command for importing page view data from exported Google Analytics CSV files, enabling users to supplement CKAN's built-in tracking. The extension also includes template customizations to alter how page view counts are displayed on dataset and resource listing pages. Key Features: Google Analytics Data Import: Imports page view data directly from a stripped-down CSV of Google Analytics data using a dedicated paster command (csv2table). The CSV should contain a list of page views, where each row starts with '/'. The PageViews column is expected to be the 3rd column. Customizable Page View Display: Changes the default presentation of page view statistics within CKAN, removing the minimum view count restriction (default is 10) so all views can be seen and modifies UI elements. Altered Page Tracking Stats: Alters the placement of page tracking statistics, moving them below Package Data (on dataset list pages) and Resource Data (on resource list pages) for better integration of tracking data. UI/UX Enhancements: Replaces the flame icon typically used for page tracking and substitutes it with more subtle background styling to modernize the presentation of tracking data. Backend Data Manipulation Uses a 'floor date' of 2011-01-01 for page view calculation. Entries are made in the trackingraw table for each view, with a unique UUID. Integration with CKAN: The extension integrates into CKAN's core functionalities by introducing a new paster command and modifying existing templates for displaying page view statistics. It relies on CKAN's built-in tracking to be enabled, but supplements its capabilities with imported data and presentation adjustments. After importing data using the csv2table paster command, the standard tracking update and search-index rebuild paster tasks need to be run to process the imported data and update the search index.. Benefits & Impact: By importing data from Google Analytics, the trak extension allows administrators to see a holistic view of page views. It changes the user experience to facilitate tracking statistics in a more integrated fashion. This allows for a better understanding of the impact and utilization of resources within the CKAN instance, based on Google Analytics data.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The data was collected on 2024-04-05 containing 3492 problems.
Cleaned via the following script.
import json
import csv
from io import TextIOWrapper
def clean(data: dict):
questions = data['data']['problemsetQuestionList']['questions']
for q in questions:
yield {
'id': q['frontendQuestionId'],
'difficulty': q['difficulty'],
'title': q['title'],
'titleCn': q['titleCn'],
'titleSlug': q['titleSlug'],
'paidOnly': q['paidOnly'],
'acRate': round(q['acRate'], 3),
'topicTags': [t['name'] for t in q['topicTags']],
}
def out_jsonl(f: TextIOWrapper):
for id in range(0, 35):
with open(f'data/{id}.json', encoding='u8') as f2:
data = json.load(f2)
for q in clean(data):
f.write(json.dumps(q, ensure_ascii=False))
f.write('
')
def out_json(f: TextIOWrapper):
l = []
for id in range(0, 35):
with open(f'data/{id}.json', encoding='u8') as f2:
data = json.load(f2)
for q in clean(data):
l.append(q)
json.dump(l, f, ensure_ascii=False)
def out_csv(f: TextIOWrapper):
writer = csv.DictWriter(f, fieldnames=[
'id', 'difficulty', 'title', 'titleCn', 'titleSlug', 'paidOnly', 'acRate', 'topicTags'
])
writer.writeheader()
for id in range(0, 35):
with open(f'data/{id}.json', encoding='u8') as f2:
data = json.load(f2)
writer.writerows(clean(data))
with open('data.jsonl', 'w', encoding='u8') as f:
out_jsonl(f)
with open('data.json', 'w', encoding='u8') as f:
out_json(f)
with open('data.csv', 'w', encoding='u8', newline='') as f:
out_csv(f)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# Crypto Price Monitoring Repository
This repository contains two CSV data files that were created to support the research titled "Price Arbitrage for DeFi Derivatives." This research is to be presented at the IEEE International Conference on Blockchain and Cryptocurrencies, taking place on 5th May 2023 in Dubai, UAE. The data files include monitoring prices for various crypto assets from several sources. The data files are structured with five columns, providing information about the symbol, unified symbol, time, price, and source of the price.
## Data Files
There are two CSV data files in this repository (one for each date):
1. `Pricemon_results_2023_01_13.csv`
2. `Pricemon_results_2023_01_14.csv`
## Data Format
Both data files have the same format and structure, with the following five columns:
1. `symbol`: The trading symbol for the crypto asset (e.g., BTC, ETH).
2. `unified_symbol`: A standardized symbol used across different platforms.
3. `time`: Timestamp for when the price data was recorded (in UTC format).
4. `price`: The price of the crypto asset at the given time (in USD).
5. `source`: The name of the price source for the data.
## Price Sources
The `source` column in the data files refers to the provider of the price data for each record. The sources include:
- `chainlink`: Chainlink Price Oracle
- `mycellium`: Built-in oracle of the Mycellium platform
- `bitfinex`: Bitfinex cryptocurrency exchange
- `ftx`: FTX cryptocurrency exchange
- `binance`: Binance cryptocurrency exchange
## Usage
You can use these data files for various purposes, such as analyzing price discrepancies across different sources, identifying trends, or developing trading algorithms. To use the data, simply import the CSV files into your preferred data processing or analysis tool.
### Example
Here's an example of how you can read and display the data using Python and the pandas library:
import pandas as pd
# Read the data from CSV file
data = pd.read_csv('Pricemon_results_2023_01_13.csv')
# Display the first 5 rows of the data
print(data.head())`
## Acknowledgements
These datasets were recorded and supported by Datamint company (value-added on-chain data provider) and its team.
## Contributing
If you have any suggestions or find any issues with the data, please feel free to contact authors.
Disclaimer: This is an artificially generated data using a python script based on arbitrary assumptions listed down.
The data consists of 100,000 examples of training data and 10,000 examples of test data, each representing a user who may or may not buy a smart watch.
----- Version 1 -------
trainingDataV1.csv, testDataV1.csv or trainingData.csv, testData.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. hour: The hour of the day (integer, 0-23) 1. weekend: A boolean indicating whether it is the weekend (True or False) 1. The data also includes a label for each user indicating whether they are likely to buy a smart watch or not (string, "yes" or "no"). The label is determined based on the following arbitrary conditions: - If the user is divorced and a random number generated by the script is less than 0.4, the label is "no" (i.e., assuming 40% of divorcees are not likely to buy a smart watch) - If it is the weekend and a random number generated by the script is less than 1.3, the label is "yes". (i.e., assuming sales are 30% more likely to occur on weekends) - If the user is male and under 30 with an income over 75,000, the label is "yes". - If the user is female and 30 or over with an income over 100,000, the label is "yes". Otherwise, the label is "no".
The training data is intended to be used to build and train a classification model, and the test data is intended to be used to evaluate the performance of the trained model.
Following Python script was used to generate this dataset
import random
import csv
# Set the number of examples to generate
numExamples = 100000
# Generate the training data
with open("trainingData.csv", "w", newline="") as csvfile:
fieldnames = ["age", "income", "gender", "maritalStatus", "hour", "weekend", "buySmartWatch"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for i in range(numExamples):
age = random.randint(18, 70)
income = random.randint(25000, 200000)
gender = random.choice(["male", "female"])
maritalStatus = random.choice(["single", "married", "divorced"])
hour = random.randint(0, 23)
weekend = random.choice([True, False])
# Randomly assign the label based on some arbitrary conditions
# assuming 40% of divorcees won't buy a smart watch
if maritalStatus == "divorced" and random.random() < 0.4:
buySmartWatch = "no"
# assuming sales are 30% more likely to occur on weekends.
elif weekend == True and random.random() < 1.3:
buySmartWatch = "yes"
elif gender == "male" and age < 30 and income > 75000:
buySmartWatch = "yes"
elif gender == "female" and age >= 30 and income > 100000:
buySmartWatch = "yes"
else:
buySmartWatch = "no"
writer.writerow({
"age": age,
"income": income,
"gender": gender,
"maritalStatus": maritalStatus,
"hour": hour,
"weekend": weekend,
"buySmartWatch": buySmartWatch
})
----- Version 2 -------
trainingDataV2.csv, testDataV2.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. educationLevel: The education level of the user (string, "high school", "associate's degree", "bachelor's degree", "master's degree", or "doctorate") 1. occupation: The occupation of the user (string, "tech worker", "manager", "executive", "sales", "customer service", "creative", "manual labor", "healthcare", "education", "government", "unemployed", or "student") 1. familySize: The number of people in the user's family (integer, 1-5) 1. fitnessInterest: A boolean indicating whether the user is interested in fitness (True or False) 1. priorSmartwatchOwnership: A boolean indicating whether the user has owned a smartwatch in the past (True or False) 1. hour: The hour of the day when the user was surveyed (integer, 0-23) 1. weekend: A boolean indicating whether the user was surveyed on a weekend (True or False) 1. buySmartWatch: A boolean indicating whether the user purchased a smartwatch (True or False)
Python script used to generate the data:
import random
import csv
# Set the number of examples to generate
numExamples = 100000
with open("t...
U.S. consumers demand variety, quality, and convenience in the foods they consume. As Americans have become wealthier and more ethnically diverse, the American food basket reflects a growing share of tropical products, spices, and imported gourmet products. Seasonal and climatic factors drive U.S. imports of popular types of fruits and vegetables and tropical products, such as cocoa and coffee. In addition, a growing share of U.S. imports can be attributed to intra-industry trade, whereby agricultural-processing industries based in the United States carry out certain processing steps offshore and import products at different levels of processing from their subsidiaries in foreign markets. This data set provides import values of edible products (food and beverages) entering U.S. ports and their origin of shipment. Data are from the U.S. Department of Commerce, U.S. Census Bureau. Food and beverage import values are compiled by calendar year into food groups corresponding to major commodities or level of processing. At least 10 years of annual data are included, enabling users to track long-term growth patterns.
Dataset Card for "letter_recognition"
Images in this dataset was generated using the script defined below. The original dataset in CSV format and more information of the original dataset is available at A-Z Handwritten Alphabets in .csv format. import os import pandas as pd import matplotlib.pyplot as plt
CHARACTER_COUNT = 26
data = pd.read_csv('./A_Z Handwritten Data.csv') mapping = {str(i): chr(i+65) for i in range(26)}
def generate_dataset(folder, end, start=0): if not… See the full description on the dataset page: https://huggingface.co/datasets/pittawat/letter_recognition.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Dataset Overview: This dataset pertains to the examination results of students who participated in a series of academic assessments at a fictitious educational institution named "University of Exampleville." The assessments were administered across various courses and academic levels, with a focus on evaluating students' performance in general management and domain-specific topics.
Columns: The dataset comprises 12 columns, each representing specific attributes and performance indicators of the students. These columns encompass information such as the students' names (which have been anonymized), their respective universities, academic program names (including BBA and MBA), specializations, the semester of the assessment, the type of examination domain (general management or domain-specific), general management scores (out of 50), domain-specific scores (out of 50), total scores (out of 100), student ranks, and percentiles.
Data Collection: The examination data was collected during a standardized assessment process conducted by the University of Exampleville. The exams were designed to assess students' knowledge and skills in general management and their chosen domain-specific subjects. It involved students from both BBA and MBA programs who were in their final year of study.
Data Format: The dataset is available in a structured format, typically as a CSV file. Each row represents a unique student's performance in the examination, while columns contain specific information about their results and academic details.
Data Usage: This dataset is valuable for analyzing and gaining insights into the academic performance of students pursuing BBA and MBA degrees. It can be used for various purposes, including statistical analysis, performance trend identification, program assessment, and comparison of scores across domains and specializations. Furthermore, it can be employed in predictive modeling or decision-making related to curriculum development and student support.
Data Quality: The dataset has undergone preprocessing and anonymization to protect the privacy of individual students. Nevertheless, it is essential to use the data responsibly and in compliance with relevant data protection regulations when conducting any analysis or research.
Data Format: The exam data is typically provided in a structured format, commonly as a CSV (Comma-Separated Values) file. Each row in the dataset represents a unique student's examination performance, and each column contains specific attributes and scores related to the examination. The CSV format allows for easy import and analysis using various data analysis tools and programming languages like Python, R, or spreadsheet software like Microsoft Excel.
Here's a column-wise description of the dataset:
Name OF THE STUDENT: The full name of the student who took the exam. (Anonymized)
UNIVERSITY: The university where the student is enrolled.
PROGRAM NAME: The name of the academic program in which the student is enrolled (BBA or MBA).
Specialization: If applicable, the specific area of specialization or major that the student has chosen within their program.
Semester: The semester or academic term in which the student took the exam.
Domain: Indicates whether the exam was divided into two parts: general management and domain-specific.
GENERAL MANAGEMENT SCORE (OUT of 50): The score obtained by the student in the general management part of the exam, out of a maximum possible score of 50.
Domain-Specific Score (Out of 50): The score obtained by the student in the domain-specific part of the exam, also out of a maximum possible score of 50.
TOTAL SCORE (OUT of 100): The total score obtained by adding the scores from the general management and domain-specific parts, out of a maximum possible score of 100.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OMOP2OBO Mappings - N3C OMOP to OBO Working group
This repository stores OMOP2OBO mappings which have been processed for use within the National COVID Cohort Collaborative (N3C) Enclave. The version of the mappings stored in this repository have been specifically formatted for use within the N3C Enclave.
N3C OMOP to OBO Working Group: https://covid.cd2h.org/ontology
Accessing the N3C-Formatted Mappings
You can access the three OMOP2OBO HPO mapping files in the Enclave from the Knowledge store using the following link: https://unite.nih.gov/workspace/compass/view/ri.compass.main.folder.1719efcf-9a87-484f-9a67-be6a29598567.
The mapping set includes three files, but you only need to merge the following two files with existing data in the Enclave in order to be able to create the concept sets:
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
The first file OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv, contains columns for the OMOP concept ids and codes as well as specifies information like whether or not the OMOP concept’s descendants should be included when deriving the concept sets (defaults to FALSE). The other file OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv, contains details on the mapping’s label (i.e., the HPO curie and label in the concept_set_id field) and its provenance/evidence (the specific column to access for this information is called intention).
Creating Concept Sets
Merge these files together on the column named codeset_id and then join them with existing Enclave tables like concept and condition_occurrence to populate the actual concept sets. The name of the concept set can be obtained from the OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv file and is stored as a string in the column called concept_set_id. Although not ideal (but is the best way to approach this currently given what fields are available in the Enclave), to get the HPO CURIE and label will require applying a regex to this column.
An example mapping is shown below (highlighting some of the most useful columns):
codeset_id: 900000000 concept_set_id: [OMOP2OBO] hp_0002031-abnormal_esophagus_morphology concept: 23868 code: 69771008 codeSystem: SNOMED includeDescendants: False intention:
Mixed - This mapping was created using the OMOP2OBO mapping algorithm (https://github.com/callahantiff/OMOP2OBO).
The Mapping Category and Evidence supporting the mappings are provided below, by OMOP concept:
23868
OBO_DbXref-OMOP_ANCESTOR_SOURCE_CODE:snomed_69771008 | OBO_DbXref-OMOP_CONCEPT_SOURCE_CODE:snomed_69771008 | CONCEPT_SIMILARITY:HP_0002031_0.713
Release Notes - v2.0.0
Preparation
In order to import data into the Enclave, the following items are needed:
Obtain API Token, which will be included in the authorization header (stored as GitHub Secret)
Obtain username hash from the Enclave
OMOP2OBO Mappings (v1.5.0)
Data
Concept Set Container (concept_set_container): CreateNewConceptSet
Concept Set Version (code_sets): CreateNewDraftOMOPConceptSetVersion
Concept Set Expression Items (concept_set_version_item): addCodeAsVersionExpression
Script
n3c_mapping_conversion.py
Generated Output
Need to have the codeset_id filled from self-generation (ideally, from a conserved range) prior to beginning any of the API steps. The current list of assigned identifiers is stored in the file named omop2obo_enclave_codeset_id_dict_v2.0.0.json. Note that in order to accommodate the 1:Many mappings the codeset ids were re-generated and rather than being ampped to HPO concepts, they are mapped to SNOMED-CT concepts. This creates a cleaner mapping and will easily scale to future mapping builds.
To be consistent with OMOP tools, specifically Atlas, we have also created Atlas-formatted json files for each mapping, which are stored in the zipped directory named atlas_json_files_v2.0.0.zip. Note that as mentioned above, to enable the representation of 1:Many mappings the filenames are no longer named after HPO concepts they are now named with the OMOP concept_id and label and additional fields have been added within the JSON files that includes the HPO ids, labels, mapping category, mapping logic, and mapping evidence.
File 1: concept_set_container
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_container.csv
Columns:
concept_set_id
concept_set_name
intention
assigned_informatician
assigned_sme
project_id
status
stage
n3c_reviewer
alias
archived
created_by
created_at
File 2: concept_set_expression_items
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
Columns:
codeset_id
concept_id
code
codeSystem
ontology_id
ontology_label
mapping_category
mapping_logic
mapping_evidence
isExcluded
includeDescendants
includeMapped
item_id
annotation
created_by
created_at
File 3: concept_set_version
Generated Data: OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
Columns:
codeset_id
concept_set_id
concept_set_version_title
project
source_application
source_application_version
created_at
atlas_json
most_recent_version
comments
intention
limitations
issues
update_message
status
has_review
reviewed_by
created_by
provenance
atlas_json_resource_url
parent_version_id
is_draft
Generated Output:
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_container.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_expression_items.csv
OMOP2OBO_v2.0.0_N3C_Enclave_CSV_concept_set_version.csv
atlas_json_files_v2.0.0.zip
omop2obo_enclave_codeset_id_dict_v2.0.0.json
By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...
Csv Exports And Imports Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.