100+ datasets found

f
Input data for Atollgen pipeline
datasetcatalog.nlm.nih.gov
figshare.com
Updated Nov 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cellier, Nicolas; Burrus, Vincent; Bioteau, Audrey (2022). Input data for Atollgen pipeline [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000433781
Explore at:
Dataset updated
Nov 12, 2022
Authors
Cellier, Nicolas; Burrus, Vincent; Bioteau, Audrey
Description
This dataset contains the AtollGen pipeline inputs, including : - sources.tar.xz : all data used for AtollGen pipeline - abioteau (fasta files and csv file) - iceberg (v1.0: html files) - iceberg_v2 (v2.0: fasta file) - islander (sql dump and sqlite file) - iv4 (csv files) - jlao (FirmiData: fasta files and xlsx files) - int.hmm / mob.hmm : hmm files for integration and other mobility modules - hmm_signature_categs.json : mapping file between signatures recorded in mobility modules hmm file and mobility module categorisation - Pfam-A.hmm.gz : Pfam-A v34.0 frozen version - card.json : CARD antibiotic resistance collection file Defense Finder models (v0.0.3) can be fecthed via macsyfinder download utility (macsydata) on github : https://github.com/gem-pasteur/macsyfinder
working with pipeline
kaggle.com
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fiza Aslam1 (2025). working with pipeline [Dataset]. https://www.kaggle.com/datasets/fizaaslam12/working-with-pipeline
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fiza Aslam1
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🚀 Feature Engineering with Scikit-Learn (Titanic Case Study)

This dataset + notebooks demonstrate feature engineering and ML pipelines on the Titanic dataset.
It includes both manual preprocessing (without pipelines) and end-to-end pipelines using Scikit-Learn.

📌 About

Feature Engineering is a crucial step in Machine Learning.
In this project, I show: - Handling missing values with SimpleImputer - Encoding categorical variables with OneHotEncoder - Building models manually vs using Pipeline - Saving models and pipelines with pickle - Making predictions with and without pipelines

📂 Content

train.csv → Titanic dataset

withpipeline.ipynb → End-to-end pipeline workflow

withoutpipeline.ipynb → Manual preprocessing workflow

predictusingpipeline.ipynb → Predictions with saved pipeline (pipe.pkl)

predictwithoutpipeline.ipynb → Predictions with classifier + encoders

models/

pipe.pkl → Complete ML pipeline (recommended for predictions)

clf.pkl → Classifier without pipeline

ohe_sex.pkl, ohe_embarked.pkl → Encoders for categorical features

⚡ Usage

1️⃣ Load and Use Pipeline

import pickle pipe = pickle.load(open("/kaggle/input/featureengineering/models/pipe.pkl", "rb")) sample = [[22, 1, 0, 7.25, 'male', 'S']] print(pipe.predict(sample)) Predict with pipeline import pickle clf = pickle.load(open("/kaggle/input/featureengineering/models/clf.pkl", "rb")) ohe_sex = pickle.load(open("/kaggle/input/featureengineering/models/ohe_sex.pkl", "rb")) ohe_embarked = pickle.load(open("/kaggle/input/featureengineering/models/ohe_embarked.pkl", "rb")) # Preprocess input manually using the encoders, then predict with clf 🎯 Inspiration Learn difference between manual feature engineering and pipeline-based workflows Understand how to avoid data leakage using Pipeline Explore cross-validation with pipelines Practice model persistence and deployment strategies ✅ Best Practice: Use pipe.pkl (pipeline) for predictions — it automatically handles preprocessing + modeling in one step! --- 👉 This version is **Kaggle-friendly** (short, structured, with code examples). Would you like me to also create a **shorter LinkedIn-style announcement post** you can use to share once your Kaggle dataset is live?
v
Coastal Ferry Routes - Create Your First Data Pipeline
anrgeodata.vermont.gov
Updated Mar 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GP Analysis - Prod Hive 1 (2023). Coastal Ferry Routes - Create Your First Data Pipeline [Dataset]. https://anrgeodata.vermont.gov/datasets/5516ef1c4db846fab0a34a34626c263e
Explore at:
Dataset updated
Mar 21, 2023
Dataset authored and provided by
GP Analysis - Prod Hive 1
License
https://www2.gov.bc.ca/gov/content?id=A519A56BC2BF44E4A008B33FCF527F61https://www2.gov.bc.ca/gov/content?id=A519A56BC2BF44E4A008B33FCF527F61
Description
Use this GeoJSON file as an input dataset in Data Pipelines. To get started, follow the steps in the Create your first data pipeline tutorial.To learn more about Data Pipelines, see Introduction to Data Pipelines.
Z
Input JSON data for the pipeline of the CLARA Knowledge Graph
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kieffer, Manoe; Fakih, Ghinwa; Serrano Alvarado, Patricia (2023). Input JSON data for the pipeline of the CLARA Knowledge Graph [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8107150
Explore at:
Dataset updated
Oct 4, 2023
Dataset provided by
LS2N, UMR6004, Nantes Université
Authors
Kieffer, Manoe; Fakih, Ghinwa; Serrano Alvarado, Patricia
Description
CLARA This deposit is part of the CLARA project. The CLARA project aims to empower teachers in the task of creating new educational resources. And in particular with the task of handling the licenses of reused educational resources.

The present deposit contains the JSON files extracted from the X5GON Postgresql database. The files are fed to the pipeline of the CLARA project for the creation of 4 different RDF graphs. This is achieved through the use of RDF mappings (RML, RML-star). That pipeline can be found on Gitlab.

The results of this pipeline can also be found on Zenodo, on those four different deposits:

Standard reification

Singleton properties

Named graphs

RDF-star

Content

The JSON files contain information on a total of 45K educational resources, linked to a total of 135K subjects (extracted from DBpedia). Each educational resource is linked to the subjects it talks about. Each of those links has two corresponding scores which represent the certainty of the given link. Those scores are "norm_cosine" and "norm_pageRank".

The dataset was cut into multiple JSON files in order to make its processing easier. There are two type of json files in this deposit:

authors_[X].json - Which lists the authors names

ER_[X].json - Which lists the educational resources and their related information. That information contains:

their title.

their description.

their language (and language_detected, only the first one is used in the pipeline here).

their license.

their mimetype.

the authors.

the date of creation of the resource.

a url linking to the resource itself.

and finally the subjects (named concepts) associated to the resource. With the corresponding scores.
C
China CN: PMI: Pipeline & Other Transport & Storage: Input Price
ceicdata.com
Updated Jun 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2020). China CN: PMI: Pipeline & Other Transport & Storage: Input Price [Dataset]. https://www.ceicdata.com/en/china/purchasing-managers-index-non-manufacturing-pipeline--other-transport--storage/cn-pmi-pipeline--other-transport--storage-input-price
Explore at:
Dataset updated
Jun 15, 2020
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2009 - Dec 1, 2009
Area covered
China
Variables measured
Purchasing Manager Index
Description
China PMI: Pipeline & Other Transport & Storage: Input Price data was reported at 63.400 % in Dec 2009. This records a decrease from the previous number of 68.300 % for Nov 2009. China PMI: Pipeline & Other Transport & Storage: Input Price data is updated monthly, averaging 60.030 % from Jan 2008 (Median) to Dec 2009, with 24 observations. The data reached an all-time high of 75.640 % in Jun 2008 and a record low of 41.470 % in Mar 2009. China PMI: Pipeline & Other Transport & Storage: Input Price data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Business and Economic Survey – Table CN.OP: Purchasing Managers' Index: Non Manufacturing: Pipeline & Other Transport & Storage.
CircaSCOPE demo input for CellProfiler pipeline
zenodo.org
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gal Manella; Gal Manella (2021). CircaSCOPE demo input for CellProfiler pipeline [Dataset]. http://doi.org/10.5281/zenodo.5139326
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5139326
Dataset updated
Jul 27, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gal Manella; Gal Manella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a demo dataset to use as input images for the CellProfiler pipeline of CircaSCOPE. Images were retrieved with IncuCyte Zoom microscope (Essen BioScience).

The images are arranged in directories as follows: YYMM/HH/Vessel/Well-Site-Channel.tif

[YY - year,

MM - month,

HH - hour,

Vessel - the vessel number, in this demo- 479 contains untreated control, and 480 contains 100nM Dexamethasone-treated cells

Well - coordinates in 24-well plate

Site - the field of view number inside each well, between 1-16

Channel - C1- green, C2- red, P- phase]
Data Engg data
kaggle.com
Updated Jun 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Apurba Sarkar (2021). Data Engg data [Dataset]. https://www.kaggle.com/apurbasarkar/data-engg-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Apurba Sarkar
Description
Now based on the above two tables (UserTable and VisitorLogData), you need to create an input feature set for the Marketing Model.

Input Feature table:

UserID

Unique ID of the registered user

No_of_days_Visited_7_Days

How many days a user was active on platform in the last 7 days.

No_Of_Products_Viewed_15_Days

Number of Products viewed by the user in the last 15 days

User_Vintage

Vintage (In Days) of the user as of today

Most_Viewed_product_15_Days

Most frequently viewed (page loads) product by the user in the last 15 days. If there are multiple products that have a similar number of page loads then , consider the recent one. If a user has not viewed any product in the last 15 days then put it as Product101.

Most_Active_OS

Most Frequently used OS by user.

Recently_Viewed_Product

Most recently viewed (page loads) product by the user. If a user has not viewed any product then put it as Product101.

Pageloads_last_7_days

Count of Page loads in the last 7 days by the user

Clicks_last_7_days

Count of Clicks in the last 7 days by the user

Process to create Input Feature:

In the current case, you are supposed to generate an input feature set as on 28-May-2018. So, the visitor table is from 07-May-2018 to 27-May-2018.

As a Data Engineer Creating ETL Pipeline would definitely be appreciated and provide you the added advantage in interviews, Your effort should be to build ETL Pipeline such that passing the information of user data and log data, It can generate the input feature table automatically
Execution of the preparation pipeline as a single loop over the input file.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charlotte Herzeel; Pascal Costanza; Dries Decap; Jan Fostier; Joke Reumers (2023). Execution of the preparation pipeline as a single loop over the input file. [Dataset]. http://doi.org/10.1371/journal.pone.0132868.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0132868.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Charlotte Herzeel; Pascal Costanza; Dries Decap; Jan Fostier; Joke Reumers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Execution of the preparation pipeline as a single loop over the input file.
L
Lithuania Construction Input Price Index: Waste Water Pipelines
ceicdata.com
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Lithuania Construction Input Price Index: Waste Water Pipelines [Dataset]. https://www.ceicdata.com/en/lithuania/construction-input-price-index-2010100/construction-input-price-index-waste-water-pipelines
Explore at:
Dataset updated
Oct 15, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2017 - Dec 1, 2017
Area covered
Lithuania
Variables measured
Producer Prices
Description
Lithuania Construction Input Price Index: Waste Water Pipelines data was reported at 123.289 2010=100 in Dec 2017. This records an increase from the previous number of 122.584 2010=100 for Nov 2017. Lithuania Construction Input Price Index: Waste Water Pipelines data is updated monthly, averaging 103.161 2010=100 from Jan 2000 (Median) to Dec 2017, with 216 observations. The data reached an all-time high of 123.289 2010=100 in Dec 2017 and a record low of 70.468 2010=100 in Jan 2002. Lithuania Construction Input Price Index: Waste Water Pipelines data remains active status in CEIC and is reported by Statistics Lithuania. The data is categorized under Global Database’s Lithuania – Table LT.I016: Construction Input Price Index: 2010=100. Rebased from 2010=100 to 2015=100 Replacement series ID: 400954327
Refined Petroleum Pipeline Transportation in the US - Market Research Report...
ibisworld.com
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBISWorld (2025). Refined Petroleum Pipeline Transportation in the US - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/united-states/market-research-reports/refined-petroleum-pipeline-transportation-industry/
Explore at:
Dataset updated
Jul 17, 2025
Dataset authored and provided by
IBISWorld
License
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Time period covered
2015 - 2030
Description
Technological advances in directional drilling and hydraulic fracturing have boosted US oil and gas output to record highs, significantly strengthening the country’s role as a primary energy supplier and exporter. This production boom has supported a steady increase in natural gas liquid production and met global supply needs amid international disruptions, such as the sanctions on Russia’s energy exports. Industrial expansion and a surge in construction activity have also driven up demand for diesel and gasoline, while electric power generator sales have remained strong. In this environment, the industry generated $15.8 billion in revenue for 2025, growing by 1.0% over the year. Despite the moderation in headline growth, profit rose 7.9% in 2025 as operators benefited from high utilization and stable, fee-based contracts. The US refined petroleum pipeline industry has also experienced stable but slowing revenue growth over the last five years, with a current five-year revenue CAGR of 2.3%. Several key trends are shaping industry performance in 2025. Domestic energy production remains robust, supported by volatile but generally elevated energy prices and ongoing industrial demand, particularly in plastics, manufacturing and power generation sectors. Near-term demand has remained resilient even as electric vehicle adoption accelerates and policy shifts gradually favor renewable energy. At the same time, pipeline operators are facing cost headwinds from lingering tariff pressures on imported steel and aluminum, materials critical for new pipeline construction and maintenance. Tariffs have pushed up input costs, prompting companies to focus on efficiency gains and technology investments, such as Smart Grid networks, to optimize operations and safeguard margins. Market consolidation continues as larger operators seek scale in a shifting regulatory landscape, while ongoing geopolitical risks and energy price volatility reinforce the sector’s focus on reliability and logistics innovation. The broader economic environment, including expectations of lower interest rates from the Federal Reserve, will likely sustain liquidity and support capital access for critical infrastructure upgrades. Looking forward, the outlook for the refined petroleum pipeline industry will be defined by a slower growth trajectory and a gradually evolving energy mix. Persistent demand for petroleum-based products in key sectors will be balanced against regulatory uncertainty, evolving energy transition policies and more modest expansion in new pipeline capacity as energy prices ease. Advances in automation and digital pipeline management should partially offset the impact of slower volume growth and rising compliance costs. Over the next five years, industry revenue is expected to increase at a CAGR of 1.2%, reaching $16.8 billion by 2030, with profit growth strengthening from 7.9% in 2025 to an estimated 8.5% by 2030, as operators adapt to the evolving market landscape.
U
United States Natural Gas Imports: Pipeline: From Canada: To Warroad,...
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States Natural Gas Imports: Pipeline: From Canada: To Warroad, Minnesota [Dataset]. https://www.ceicdata.com/en/united-states/natural-gas-imports-pipeline-by-point-of-entry
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 2024 - Sep 1, 2025
Area covered
United States
Description
Natural Gas Imports: Pipeline: From Canada: To Warroad, Minnesota data was reported at 237.000 Cub ft mn in Sep 2025. This records an increase from the previous number of 180.000 Cub ft mn for Aug 2025. Natural Gas Imports: Pipeline: From Canada: To Warroad, Minnesota data is updated monthly, averaging 294.000 Cub ft mn from Jan 2011 (Median) to Sep 2025, with 177 observations. The data reached an all-time high of 599.000 Cub ft mn in Jan 2013 and a record low of 147.000 Cub ft mn in Sep 2022. Natural Gas Imports: Pipeline: From Canada: To Warroad, Minnesota data remains active status in CEIC and is reported by U.S. Energy Information Administration. The data is categorized under Global Database’s United States – Table US.RB: Natural Gas Imports: Pipeline: by Point of Entry.
D
Data Pipeline Drift Detection Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Pipeline Drift Detection Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-pipeline-drift-detection-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Pipeline Drift Detection Market Outlook

According to our latest research, the global data pipeline drift detection market size reached USD 1.42 billion in 2024, driven by the increasing complexity of data ecosystems and the need for robust monitoring solutions. The market is expected to grow at a CAGR of 19.6% during the forecast period, reaching USD 6.09 billion by 2033. This rapid growth is attributed to the surge in adoption of artificial intelligence (AI), machine learning (ML), and big data analytics across various industries, which has heightened the demand for real-time data integrity and quality assurance mechanisms.

One of the primary growth factors for the data pipeline drift detection market is the exponential increase in data volumes and the corresponding need to ensure data quality and reliability. As organizations increasingly rely on automated data pipelines to support business intelligence, decision-making, and customer experiences, the risk of data drift—where input data distributions shift from those seen during model training—has become a critical concern. This has led to substantial investments in drift detection technologies that can proactively identify and mitigate anomalies, ensuring that data-driven operations remain accurate and trustworthy. The proliferation of cloud-native architectures and hybrid data environments further amplifies the need for advanced drift detection solutions that can operate seamlessly across diverse infrastructures.

Another significant driver is the regulatory landscape, which is evolving rapidly in response to data privacy, compliance, and governance requirements. Organizations in highly regulated sectors such as BFSI, healthcare, and retail are under increasing pressure to maintain data integrity and demonstrate compliance with standards such as GDPR, HIPAA, and PCI DSS. Data pipeline drift detection tools provide automated monitoring and alerting capabilities that help these organizations detect deviations, maintain audit trails, and ensure continuous compliance. The integration of drift detection with broader data governance frameworks is becoming a best practice, further fueling market growth as enterprises seek to minimize risk and avoid costly data breaches or regulatory penalties.

Technological advancements are also propelling the market forward. The adoption of AI and ML-powered drift detection algorithms enables organizations to detect subtle and complex data drifts that traditional rule-based systems might miss. These intelligent solutions leverage statistical analysis, pattern recognition, and predictive analytics to provide real-time insights into data pipeline health. Furthermore, the rise of DevOps and DataOps practices is driving the need for automated, scalable, and easily deployable drift detection solutions that can integrate with existing data management workflows. The increasing availability of open-source drift detection frameworks is lowering barriers to entry, enabling even small and medium-sized enterprises to benefit from advanced monitoring capabilities.

From a regional perspective, North America continues to dominate the data pipeline drift detection market, accounting for the largest share in 2024. This leadership is supported by the region's mature IT infrastructure, high adoption of cloud technologies, and the presence of leading technology vendors. However, Asia Pacific is emerging as the fastest-growing region, with a projected CAGR of over 22% through 2033. The rapid digital transformation across sectors in countries like China, India, and Japan, combined with increasing investments in data-driven initiatives, is accelerating demand for drift detection solutions. Europe also represents a significant market, driven by stringent data privacy regulations and a strong focus on data governance across industries.

Component Analysis

The component segment of the data pipeline drift detection market is bifurcated into software and services, each playing a pivotal role in the adoption and implementation of drift detection solutions. Software solutions are at the core of this market, encompassing a wide array of tools and platforms designed to automate the detection of data drifts, monitor model performance, and generate actionable alerts. These solutions leverage advanced analytics, AI, and machine learning algorithms to provide real-time insights into data pipeline health. The software segment is wi
U
United States Natural Gas Imports: Pipeline: From Canada: To Highgate...
ceicdata.com
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2024). United States Natural Gas Imports: Pipeline: From Canada: To Highgate Springs, Vermont [Dataset]. https://www.ceicdata.com/en/united-states/natural-gas-imports-pipeline-by-point-of-entry-annual
Explore at:
Dataset updated
Apr 30, 2024
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2013 - Dec 1, 2024
Area covered
United States
Description
Natural Gas Imports: Pipeline: From Canada: To Highgate Springs, Vermont data was reported at 12,528.000 Cub ft mn in 2024. This records an increase from the previous number of 12,494.000 Cub ft mn for 2023. Natural Gas Imports: Pipeline: From Canada: To Highgate Springs, Vermont data is updated yearly, averaging 9,319.000 Cub ft mn from Dec 1996 (Median) to 2024, with 29 observations. The data reached an all-time high of 14,574.000 Cub ft mn in 2016 and a record low of 7,680.000 Cub ft mn in 1998. Natural Gas Imports: Pipeline: From Canada: To Highgate Springs, Vermont data remains active status in CEIC and is reported by U.S. Energy Information Administration. The data is categorized under Global Database’s United States – Table US.RB030: Natural Gas Imports: Pipeline: by Point of Entry: Annual.
The LONI Pipeline workflow inputs
figshare.com
application/x-rar
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Garijo (2016). The LONI Pipeline workflow inputs [Dataset]. http://doi.org/10.6084/m9.figshare.1603175.v2
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1603175.v2
Dataset updated
Jan 20, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Daniel Garijo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Corpora used to calculate commonly occurring workflow fragments from the LONI Pipeline
Big Transfer (BiT) Models (.npz)
kaggle.com
zip
Updated Jan 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ceshine Lee (2021). Big Transfer (BiT) Models (.npz) [Dataset]. https://www.kaggle.com/datasets/ceshine/big-transfer-bit-models-npz/data
Explore at:
zip(3763087080 bytes)Available download formats
Dataset updated
Jan 25, 2021
Authors
Ceshine Lee
Description
Taken from the README of the google-research/big_transfer repo:

Big Transfer (BiT): General Visual Representation Learning

by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

Introduction

In this repository we release multiple models from the Big Transfer (BiT): General Visual Representation Learning paper that were pre-trained on the ILSVRC-2012 and ImageNet-21k datasets. We provide the code to fine-tuning the released models in the major deep learning frameworks TensorFlow 2, PyTorch and Jax/Flax.

We hope that the computer vision community will benefit by employing more powerful ImageNet-21k pretrained models as opposed to conventional models pre-trained on the ILSVRC-2012 dataset.

We also provide colabs for a more exploratory interactive use: a TensorFlow 2 colab, a PyTorch colab, and a Jax colab.

Installation

Make sure you have Python>=3.6 installed on your machine.

To setup Tensorflow 2, PyTorch or Jax, follow the instructions provided in the corresponding repository linked here.

In addition, install python dependencies by running (please select tf2, pytorch or jax in the command below): pip install -r bit_{tf2|pytorch|jax}/requirements.txt

How to fine-tune BiT

First, download the BiT model. We provide models pre-trained on ILSVRC-2012 (BiT-S) or ImageNet-21k (BiT-M) for 5 different architectures: ResNet-50x1, ResNet-101x1, ResNet-50x3, ResNet-101x3, and ResNet-152x4.

For example, if you would like to download the ResNet-50x1 pre-trained on ImageNet-21k, run the following command: wget https://storage.googleapis.com/bit_models/BiT-M-R50x1.{npz|h5} Other models can be downloaded accordingly by plugging the name of the model (BiT-S or BiT-M) and architecture in the above command. Note that we provide models in two formats: npz (for PyTorch and Jax) and h5 (for TF2). By default we expect that model weights are stored in the root folder of this repository.

Then, you can run fine-tuning of the downloaded model on your dataset of interest in any of the three frameworks. All frameworks share the command line interface python3 -m bit_{pytorch|jax|tf2}.train --name cifar10_`date +%F_%H%M%S` --model BiT-M-R50x1 --logdir /tmp/bit_logs --dataset cifar10 Currently. all frameworks will automatically download CIFAR-10 and CIFAR-100 datasets. Other public or custom datasets can be easily integrated: in TF2 and JAX we rely on the extensible tensorflow datasets library. In PyTorch, we use torchvision’s data input pipeline.

Note that our code uses all available GPUs for fine-tuning.

We also support training in the low-data regime: the `--examples_per_class
h
dataflow-demo-Text
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenDCAI (2025). dataflow-demo-Text [Dataset]. https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text
Explore at:
Dataset updated
Nov 24, 2025
Dataset authored and provided by
OpenDCAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
⚠️⚠️⚠️ NSFW Content Warning ⚠️⚠️⚠️This dataset/model contains content that may be offensive or inappropriate for some users, including NSFW (Not Safe For Work) material.Please proceed with caution.

DataFlow demo -- Text Pipeline

This dataset card serves as a demo for showcasing the Text data processing pipeline of the Dataflow Project. It provides an intuitive view of the pipeline’s input dirty data and filtered outputs.

Overview

The purpose of the Text Pipeline is to… See the full description on the dataset page: https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text.
k
Natural Gas Imports by Entry Point - International pipelines
datasource.kapsarc.org
Updated Nov 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Natural Gas Imports by Entry Point - International pipelines [Dataset]. https://datasource.kapsarc.org/explore/dataset/natural-gas-imports-by-entry-point-international-pipelines/
Explore at:
Dataset updated
Nov 15, 2025
Description
This dataset contains Natural Gas Imports by Entry Point - International pipelines. Follow datasource.kapsarc.org for timely data to advance energy economics research.Notes:CORES uses GWh as unit of measure for Natural Gas.
Average cost of natural gas transport via pipeline in Italy 2018, by entry...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Average cost of natural gas transport via pipeline in Italy 2018, by entry point [Dataset]. https://www.statista.com/statistics/1054568/natural-gas-transport-costs-italy/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
Italy
Description
In terms of transportation costs, the most expensive import source for natural gas in Italy was Norway: on average, a metric ton of Norwegian gas imported via the Griess mountain pass cost ** euros in 2018. By contrast, Russian gas had the lowest transport costs, as it cost about **** euros to import a metric ton of gas via the Tarvisio-Malborghetto pipeline.
Z
Example Inputs for PIPEFISH Spatial Transcriptomics Pipeline Tool
data.niaid.nih.gov
zenodo.org
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecila Cisar; Nicholas Keener; Benedict Paten (2023). Example Inputs for PIPEFISH Spatial Transcriptomics Pipeline Tool [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7647745
Explore at:
Dataset updated
Jun 29, 2023
Dataset provided by
University of California Santa Cruz
Authors
Cecila Cisar; Nicholas Keener; Benedict Paten
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains example input data, including raw images, codebooks, parameters, and segmentation labels needed to run the FISH spatial transcriptomics pipeline tool PIPEFISH. The datasets contained are:

in situ sequencing (ISS) of a whole coronal slice of a mouse brain (50 genes). Link to publication.

Gataric, M., Park, J.S., Li, T., Vaskivskyi, V., Svedlund, J., Strell, C., Roberts, K., Nilsson, M., Yates, L.R., Bayraktar, O. and Gerstung, M., 2021. PoSTcode: Probabilistic image-based spatial transcriptomics decoder. bioRxiv, pp.2021-10.

MERFISH of human U2-OS cell cultures (130 genes). Link to publication.

Moffitt, J.R., Hao, J., Wang, G., Chen, K.H., Babcock, H.P. and Zhuang, X., 2016. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proceedings of the National Academy of Sciences, 113(39), pp.11046-11051.

seqFISH of a developing mouse embryo (351 genes). Link to publication.

Lohoff, T., Ghazanfar, S., Missarova, A., Koulena, N., Pierson, N., Griffiths, J.A., Bardot, E.S., Eng, C.H., Tyser, R.C.V., Argelaguet, R. and Guibentif, C., 2022. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nature biotechnology, 40(1), pp.74-85.

In order to correctly format the inputs, run the prep_input.py script for the dataset you wish to run while in the same directory as the script.

Memory requirements for each dataset:

iss_mouse_brain - 3GB

merfish_human_u2os - 7GB

seqfish_mouse_embryo - 37GB
Recombinant Read Extraction Pipeline Test Input File
figshare.com
txt
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jillis Grubben (2024). Recombinant Read Extraction Pipeline Test Input File [Dataset]. http://doi.org/10.6084/m9.figshare.27967968.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27967968.v1
Dataset updated
Dec 5, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jillis Grubben
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recombinant Read Extraction Pipeline with Test Input DataDescription:This dataset showcases the Recombinant Read Extraction Pipeline, previously developed by us (https://doi.org/10.6084/m9.figshare.26582380), designed for the detection of recombination events in sequencing data. The pipeline enables the alignment of sequence reads to a reference genome, generation of SNP strings, identification of haplotypes, extraction of recombinant sequences, and comprehensive result compilation into an Excel summary for seamless analysis.Included in this dataset:config.json: Configuration file with default settings.pipeline_test_reads.fa: A test FASTA file containing simulated recombination and allele replacement events, specifically:Two recombination events each covered by 15 reads, transitioning between Solanum lycopersicum cv. Moneyberg and Moneymaker haplotypes.One recombination event covered by 20 reads, involving a switch at the extremity of the amplicon analysed from Moneymaker to Moneyberg haplotype.One allele replacement event covered by 20 reads, featuring recombination from Moneymaker to Moneyberg and back to Moneymaker.Wild-type Solanum lycopersicum cv. Moneyberg and Moneymaker sequences.final_output.xlsx: Example output summarizing read names, sequences, and read counts.Usage Instructions:Install Dependencies: Follow the installation guidelines to set up required software and Python libraries (please refer to https://doi.org/10.6084/m9.figshare.26582380).Configure Pipeline: Customize parameters in config.json as needed.Run Pipeline: Execute the pipeline using the provided script to process the test input file.Review Outputs: Examine final_output.xlsx to verify the detection and summarization of recombinant events.The dataset pipeline_test_reads.fa serves as a control dataset designed to verify the functionality of the Recombinant Read Extraction Pipeline previously described (https://doi.org/10.6084/m9.figshare.26582380). This dataset contains artificially generated "reads" and does not include any genuine DNA sequencing data.Keywords: Genomic Data Processing, Recombinant Detection, Haplotype Analysis, Bioinformatics Pipeline, SNP Analysis

Facebook

Twitter

Click to copy link

Link copied

Cite

Cellier, Nicolas; Burrus, Vincent; Bioteau, Audrey (2022). Input data for Atollgen pipeline [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000433781

Input data for Atollgen pipeline

Explore at:

Dataset updated

Nov 12, 2022

Authors

Cellier, Nicolas; Burrus, Vincent; Bioteau, Audrey

Description

This dataset contains the AtollGen pipeline inputs, including : - sources.tar.xz : all data used for AtollGen pipeline - abioteau (fasta files and csv file) - iceberg (v1.0: html files) - iceberg_v2 (v2.0: fasta file) - islander (sql dump and sqlite file) - iv4 (csv files) - jlao (FirmiData: fasta files and xlsx files) - int.hmm / mob.hmm : hmm files for integration and other mobility modules - hmm_signature_categs.json : mapping file between signatures recorded in mobility modules hmm file and mobility module categorisation - Pfam-A.hmm.gz : Pfam-A v34.0 frozen version - card.json : CARD antibiotic resistance collection file Defense Finder models (v0.0.3) can be fecthed via macsyfinder download utility (macsydata) on github : https://github.com/gem-pasteur/macsyfinder

Clear search

Close search

Google apps

Main menu

Input data for Atollgen pipeline

working with pipeline

🚀 Feature Engineering with Scikit-Learn (Titanic Case Study)

📌 About

📂 Content

⚡ Usage

1️⃣ Load and Use Pipeline

Coastal Ferry Routes - Create Your First Data Pipeline

Input JSON data for the pipeline of the CLARA Knowledge Graph

China CN: PMI: Pipeline & Other Transport & Storage: Input Price

CircaSCOPE demo input for CellProfiler pipeline

Data Engg data

Now based on the above two tables (UserTable and VisitorLogData), you need to create an input feature set for the Marketing Model.

Input Feature table:

UserID

No_of_days_Visited_7_Days

No_Of_Products_Viewed_15_Days

User_Vintage

Most_Viewed_product_15_Days

Most_Active_OS

Recently_Viewed_Product

Pageloads_last_7_days

Clicks_last_7_days

Process to create Input Feature:

In the current case, you are supposed to generate an input feature set as on 28-May-2018. So, the visitor table is from 07-May-2018 to 27-May-2018.

As a Data Engineer Creating ETL Pipeline would definitely be appreciated and provide you the added advantage in interviews, Your effort should be to build ETL Pipeline such that passing the information of user data and log data, It can generate the input feature table automatically

Execution of the preparation pipeline as a single loop over the input file.

Lithuania Construction Input Price Index: Waste Water Pipelines

Refined Petroleum Pipeline Transportation in the US - Market Research Report...

United States Natural Gas Imports: Pipeline: From Canada: To Warroad,...

Data Pipeline Drift Detection Market Research Report 2033

Data Pipeline Drift Detection Market Outlook

Component Analysis

United States Natural Gas Imports: Pipeline: From Canada: To Highgate...

The LONI Pipeline workflow inputs

Big Transfer (BiT) Models (.npz)

Big Transfer (BiT): General Visual Representation Learning

Introduction

Installation

How to fine-tune BiT

dataflow-demo-Text

Natural Gas Imports by Entry Point - International pipelines

Average cost of natural gas transport via pipeline in Italy 2018, by entry...

Example Inputs for PIPEFISH Spatial Transcriptomics Pipeline Tool

Recombinant Read Extraction Pipeline Test Input File

Input data for Atollgen pipeline