100+ datasets found

original : CIFAR 100
kaggle.com
zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
Explore at:
zip(168517945 bytes)Available download formats
Dataset updated
Dec 28, 2024
Authors
Shashwat Pandey
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...
Data from: net data
kaggle.com
zip
Updated Jul 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caleb Woy (2020). net data [Dataset]. https://www.kaggle.com/datasets/whatsthevariance/net-data
Explore at:
zip(10083796 bytes)Available download formats
Dataset updated
Jul 17, 2020
Authors
Caleb Woy
Description
Neural Net Model Parameter files for my kernel https://www.kaggle.com/whatsthevariance/diagnosing-skin-cancer-with-bagged-neural-nets

350 EPOCHS worth of training can be loaded with these files using PyTorch
Spatiotemporal database in above- and belowground net primary production...
figshare.com
zip
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Zhou; Benjamin Laffitte; Yuting Hou; Jianfei Cao; Guangjin Zhou; Xunwei Sun; Peng Hou (2025). Spatiotemporal database in above- and belowground net primary production across multi-data-driven models on the Tibetan Plateau at 1km resolution [Dataset]. http://doi.org/10.6084/m9.figshare.28416224.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28416224.v4
Dataset updated
Apr 21, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Tao Zhou; Benjamin Laffitte; Yuting Hou; Jianfei Cao; Guangjin Zhou; Xunwei Sun; Peng Hou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tibetan Plateau
Description
Corresponding author: Peng Hou (houpcy@163.com)Abstract: The Tibetan Plateau (TP), one of the most climate-sensitive regions on Earth, plays a crucial role in global carbon cycling. However, the spatiotemporal variability of modeled above- and below-ground net primary production (ANPP and BNPP) remain uncertain across linear (LL), machine learning (ML), and deep learning (DL) models, particularly for BNPP. To address this gap, we applied 96 data-driven models, including LL, ML, and DL approaches, combined with 5-fold cross-validation and Monte Carlo simulations to estimate ANPP and BNPP at 1 km resolution from 1981 to 2018 across the TP. The result showed that the best-performing models for ANPP and BNPP achieved R2 values ranging from 0.80 to 0.88 for ANPP and from 0.89 to 0.95 for BNPP. Spatiotemporal patterns of ANPP and BNPP were generally consistent across model types. However, total ANPP exhibited a significant declining trend at −0.003 Pg C yr−¹, while BNPP increased by 0.001 to 0.003 Pg C yr−¹. Notably, inter-model variability in annual totals reached up to 0.13 and 0.32 Pg C yr−¹ for ANPP and BNPP, respectively. These discrepancies likely stem from differences in how models interpret input variable contributions, as reflected in distinct spatial patterns—particularly in DL simulations, which showed divergence in ANPP across southern TP (e.g., Nyingchi) and BNPP in northern to central regions (e.g., from Xining to Zhiduo). Our findings offer a robust methodological benchmark for modeling ecosystem carbon allocation under climate change and provide valuable insights for adaptive carbon management in one of the world’s most vulnerable regions.Filename: ANPP_xgblinear_Tibetan _1981-2018_1km_tif.zip; ANPP_Rborist_Tibetan _1981-2018_1km_tif.zip; ANPP_HYFIS_Tibetan _1981-2018_1km_tif.zip; BNPP_xgblinear_Tibetan _1981-2018_1km_tif.zip; BNPP_xgbDART_Tibetan _1981-2018_1km_tif.zip; BNPP_monmlp_Tibetan _1981-2018_1km_tif.zip.File information：The names of each Zip compressed file are composed of the observation object, simulation model, region, time range, spatial resolution, and data format. For example, 'ANPP_xgblinear_Tibetan_1981 - 2018_1km_tif.zip' consists of ANPP (observation object) + '_' + xgblinear (simulation model) + '_' + Tibetan (region) + '_' + 1981-2018 (time range) + '_' + 1km (spatial resolution) + '_' + tif (data format) + '.zip'. Among them, ANPP is Aboveground Net Primary Production; BNPP is Belowground Net Primary Production; xgbLinear is a linear model, Rborist/xgbTree are machine learning models of the linear model, and HYFIS/monmlp are a deep learning model.The unit for these data is 'g C m-2 yr-1'.Author contributions: Tao Zhou contributed to the conceptualization, methodology, software, and writing - original draft, review, editing; Benjamin Laffitte, Jianfei Cao, Xuwei Sun and Guangjin Zhou supervised manuscript writing; Yuting Hou contributed to the data curation and software; contributed to; Peng Hou contributed to data curation, writing - original draft preparation, software, and writing–review, editing; all authors contributed to the final preparation of the manuscript.
Extended datasets from MM-IMDB and Ads-Parallelity dataset with the features...
zenodo.org
data-staging.niaid.nih.gov
bin
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shunsuke Kitada; Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi; Hitoshi Iyatomi; Yuki Iwazaki; Riku Togashi (2023). Extended datasets from MM-IMDB and Ads-Parallelity dataset with the features from Google Cloud Vision API [Dataset]. http://doi.org/10.5281/zenodo.7050924
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7050924
Dataset updated
Feb 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shunsuke Kitada; Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi; Hitoshi Iyatomi; Yuki Iwazaki; Riku Togashi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is extended datasets from MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] dataset with the features from Google Cloud Vision API. These datasets are stored in jsonl (JSON Lines) format.

Abstract (from our paper):

There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM²S²). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.

Dataset (MM-IMDB and Ads-Parallelity):

We extended two multimodal datasets, namely, MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] for the empirical experiments. The MM-IMDB dataset contains 25,925 movies with multiple labels (genres). We used the original split provided in the dataset and reported the F1 scores (micro, macro, and samples) of the test set. The Ads-Parallelity dataset contains 670 images and slogans from persuasive advertisements to understand the implicit relationship (parallel and non-parallel) between these two modalities. A binary classification task is used to predict whether the text and image in the same ad convey the same message.

We transformed the following multimodal information (i.e., visual, textual, and categorical data) into textual tokens and fed these into our proposed model. We used the Google Cloud Vision API for the visual features to obtain the following four pieces of information as tokens: (1) text from the OCR, (2) category labels from the label detection, (3) object tags from the object detection, and (4) the number of faces from the facial detection. We input the labels and object detection results as a sequence in order of confidence, as obtained from the API. We describe the visual, textual, and categorical features of each dataset below.

MM-IMDB: We used the title and plot of movies as the textual features, and the aforementioned API results based on poster images as visual features.

Ads-Parallelity: We used the same API-based visual features as in MM-IMDB. Furthermore, we used textual and categorical features consisting of textual inputs of transcriptions and messages, and categorical inputs of natural and text concrete images.
Microsoft Azure Predictive Maintenance
kaggle.com
zip
Updated Oct 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
arnab (2020). Microsoft Azure Predictive Maintenance [Dataset]. https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance
Explore at:
zip(32497141 bytes)Available download formats
Dataset updated
Oct 15, 2020
Authors
arnab
Description
Context

This an example data source which can be used for Predictive Maintenance Model Building. It consists of the following data:

Machine conditions and usage: The operating conditions of a machine e.g. data collected from sensors.

Failure history: The failure history of a machine or component within the machine.

Maintenance history: The repair history of a machine, e.g. error codes, previous maintenance activities or component replacements.

Machine features: The features of a machine, e.g. engine size, make and model, location.

Details

Telemetry Time Series Data (PdM_telemetry.csv): It consists of hourly average of voltage, rotation, pressure, vibration collected from 100 machines for the year 2015.

Error (PdM_errors.csv): These are errors encountered by the machines while in operating condition. Since, these errors don't shut down the machines, these are not considered as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

Maintenance (PdM_maint.csv): If a component of a machine is replaced, that is captured as a record in this table. Components are replaced under two situations: 1. During the regular scheduled visit, the technician replaced it (Proactive Maintenance) 2. A component breaks down and then the technician does an unscheduled maintenance to replace the component (Reactive Maintenance). This is considered as a failure and corresponding data is captured under Failures. Maintenance data has both 2014 and 2015 records. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

Failures (PdM_failures.csv): Each record represents replacement of a component due to failure. This data is a subset of Maintenance data. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

Metadata of Machines (PdM_Machines.csv): Model type & age of the Machines.

Acknowledgements

This dataset was available as a part of Azure AI Notebooks for Predictive Maintenance. But as of 15th Oct, 2020 the notebook (link) is no longer available. However, the data can still be downloaded using the following URLs:

https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv

Inspiration

Try to use this data to build Machine Learning models related to Predictive Maintenance.
EK80 Water Column Sonar Data Collected During SH2204
catalog.data.gov
gimi9.com
Updated Apr 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2024). EK80 Water Column Sonar Data Collected During SH2204 [Dataset]. https://catalog.data.gov/dataset/ek80-water-column-sonar-data-collected-during-sh2204
Explore at:
Dataset updated
Apr 1, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Description
The following are specific objectives for the spring CalCOFI. Continuously sample pelagic fish eggs using the Continuous Underway Fish Egg Sampler (CUFES). The data will be used to estimate the distributions and abundances of spawning hake, anchovy, mackerel, and early spawning pacific sardine. Continuously sample multi-frequency acoustic backscatter using the Simrad EK80 and the Simrad ME70. The data will be used to estimate the distributions and abundances of coastal pelagic fishes (e.g., sardine, anchovy, and mackerel), and krill species. Continuously sample sea-surface temperature, salinity, and chlorophyll-a using a thermosalinometer and fluorometer. These data will be used to estimate the physical oceanographic habitats for target species. Continuously sample air temperature, barometric pressure, and wind speed and direction using an integrated weather station. Sample profiles of seawater temperature, salinity, chlorophyll-a, nutrients, and phytoplankton using a CTD with water-sampling rosette and other instruments at prescribed stations. Measurements of extracted chlorophyll and phaeophytin will be obtained with a fluorometer. Nutrients will be measured with an auto-analyzer. Sample the light intensity in the photic zone using a standard secchi disk in conjunction with a daytime CTD station. Sample plankton using a CalBOBL (CalCOFI Bongo Oblique) at prescribed stations. These data will be used to estimate the distributions and abundances of ichthyoplankton and zooplankton species. Sample plankton using a Manta (neuston) net at prescribed stations. These data will be used to estimate the distributions and abundances of ichthyoplankton species. Sample the vertically integrated abundance of fish eggs using a Pairovet net at prescribed stations. These data will be used to quantify the abundances and distributions of fish eggs. Sample plankton using a PRPOOS (Planktonic Rate Processes in Oligotrophic Ocean Systems) net at all prescribed CalCOFI stations on lines 90.0 and 80.0 as well as stations out to and including station 70.0 on lines 86.7 and 83.3 and station 81.8 46.9. PRPOOS will not be towed on SCCOOS stations. These data will be used in analyses by the LTER (Long Term Ecological Research) project. Continuously sample profiles of currents using the RDI/Teledyne Acoustic Doppler Current Profiler. This will be dependent on the ability to synchronize the ADCPâ€™s output with the EK80 and ME70. The EK80 and ME70 will hold priority over the ADCP. Continuously observe, during daylight hours, seabirds and marine mammals. These data will be used to estimate the distributions and abundances of seabirds and marine mammals.
a
Reynolds-Smith V2 global monthly average sea surface temperatures
data.aad.gov.au
researchdata.edu.au
+1more
Updated May 7, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AADC, DATA OFFICER (2003). Reynolds-Smith V2 global monthly average sea surface temperatures [Dataset]. https://data.aad.gov.au/metadata/records/REYNOLDS_MONTHLY_SST
Explore at:
Dataset updated
May 7, 2003
Dataset provided by
Australian Antarctic Data Centre
Authors
AADC, DATA OFFICER
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
From the parent record held in the GCMD:

The data sets in the CDC archive called "Reynolds SST' and "Reconstructed Reynolds SST" were discontinued on 1 April 2003.

A new OI SST data set is available as described here, which includes a new analysis for the historical data and updates into the future. NCEP will not provide new data for the "Reynolds SST" after December 2002 and CDC will remove the "Reynolds SST" data set on 1 April 2003.

TO SEE THE NEW DATASET, PLEASE SEARCH THE GLOBAL CHANGE MASTER DIRECTORY FOR MORE INFORMATION. REFER TO THE METADATA RECORD (LINKED BELOW): REYNOLDS_SST

This metadata record is a modified child record of an original parent record registered at the Global Change Master Directory. (The Entry ID of the parent record is REYNOLDS_SST, and can be found on the GCMD website - see the provided URL). The data described here are a subset of the original dataset. This metadata record has been created for the express use of Australian Government Antarctic Division employees.

Reproduced from: http://www.emc.ncep.noaa.gov/research/cmb/sst_analysis/

Analysis Description and Recent Reanalysis

The optimum interpolation (OI) sea surface temperature (SST) analysis is produced weekly on a one-degree grid. The analysis uses in situ and satellite SSTs plus SSTs simulated by sea ice cover. Before the analysis is computed, the satellite data are adjusted for biases using the method of Reynolds (1988) and Reynolds and Marsico (1993). A description of the OI analysis can be found in Reynolds and Smith (1994). The bias correction improves the large scale accuracy of the OI.

In November 2001, the OI fields were recomputed for late 1981 onward. The new version will be referred to as OI.v2.

The most significant change for the OI.v2 is the improved simulation of SST obs from sea ice data following a technique developed at the UK Met Office. This change has reduced biases in the OI SST at higher latitudes. Also, the update and extension of COADS has provided us with improved ship data coverage through 1997, reducing the residual satellite biases in otherwise data sparse regions.

The data are available in the following formats:

Net CDF Flat binary files Text
t
Continuous pCO2 and Sea-Air CO2 Net Fluxes from the Following Ocean Rings in...
service.tib.eu
Updated Dec 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Continuous pCO2 and Sea-Air CO2 Net Fluxes from the Following Ocean Rings in the South Atlantic (FORSA) cruise - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-957502
Explore at:
Dataset updated
Dec 1, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here we provide CO2-system properties that were continuously measured in a southeast-northwest transect in the South Atlantic Ocean in which six Agulhas eddies were sampled. The Following Ocean Rings in the South Atlantic (FORSA) cruise occurred between 27th June and 15th July 2015, from Cape Town – South Africa to Arraial do Cabo – Brazil, on board the first research cruise of the Brazilian Navy RV Vital de Oliveira, as part of an effort of the Brazilian High Latitude Oceanography Group (GOAL). Finally, it contributed to the activities developed by the following Brazilian networks: GOAL, Brazilian Ocean Acidification Network (BrOA), Brazilian Research Network on Global Climate Change (Rede CLIMA). The focus of the first study using this dataset (Orselli et al. 2019a) was on investigate the role played by the Agulhas eddies on the sea-air CO2 net flux along their trajectories through the South Atlantic Ocean and model the seawater CO2–related properties as function of environmental parameters. This data has been used to contribute to the scientific discussion about the Agulhas eddies impact on the changes of the marine carbonate system, which is an expanding oceanographic subject (Carvalho et al. 2019; Orselli et al. 2019b; Ford et al. 2023). Seawater and atmospheric CO2 molar fraction (xCO2sw and xCO2atm, respectively) were continuously measured during the cruise track, as well as the sea surface temperature (T) and salinity (S). The following sampling methodology is fully described in Orselli et al. (2019a). The underway xCO2 sampling was taken using an autonomous system GO–8050, General Oceanic®, equipped with a non-dispersive infrared gas analyzer (LI–7000, LI–COR®). The underway T and S were sampled using a Sea-Bird® Thermosalinograph SBE21. Seawater intake to feed the continuous systems of the GO-8050 and the SBE21 was set at ~5 m below the sea surface. The xCO2 system was calibrated with four standard gases (CO2 concentrations of 0, 202.10, 403.20, and 595.50 uatm) within a 12 h interval along the entire cruise. Every 3 h the system underwent a standard reading, to check the derivation and allow the xCO2 corrections. The xCO2 measurements were taken within 90 seconds interval. After a hundred of xCO2sw readings, the system was changed to atmosphere and five xCO2atm readings were taken (Pierrot et al., 2009). xCO2 (umol mol–1) inputs were corrected by the CO2 standards (Pierrot et al., 2009). Thermosalinograph data were corrected using the CTD surface data. Then, together with the pressure data, these data were used to calculate the pCO2 of the equilibrator and atmosphere (pCO2eq and pCO2atm, respectively, uatm), following Weiss & Price (1980). Using the pCO2eq, which is calculated at the equilibrator temperature, it is possible to calculate the pCO2 at the in situ temperature (pCO2sw, uatm), according to Takahashi et al. (2009). Another common calculation regarding pCO2sw data, is the temperature-normalized pCO2sw (NpCO2sw, uatm). This means that the temperature effect is removed when one calculates the NpCO2sw for the mean cruise temperature. The procedure followed the Takahashi et al. (2009) and considered the mean cruise temperature of 20.39°C. The results obtained allow one to investigate the exchanges of CO2 at the ocean-atmosphere interface by calculating the pCO2 difference between these two reservoirs (DeltapCO2, DpCO2=pCO2sw–pCO2atm, uatm). Negative (positive) DpCO2 results indicate that the ocean acts as a CO2 sink (source) for the atmosphere. To determine the FCO2, the monthly mean wind speed data of July 2015 (at 10 m height) were extracted from the ERA-Interim atmospheric reanalysis product of the European Centre for Medium Range Weather Forecast (http://apps.ecmwf.int/datasets/data/interim-full-moda/levtype=sfc/) since the use of long-term mean is usual (e.g., Takahashi et al., 2009). The average wind speed for the period and whole area was 6.8 ± 0.6 m s−1, ranging from 5.6 to 8.3 m s−1. The CO2 transfer coefficients proposed by Takahashi et al. (2009) and Wanninkhof (2014) were used. With all these data together, the FCO2 was determined according to Broecker & Peng (1982), where FCO2 is the sea-air CO2 net flux (mmol m–2 d–1; FT09 and FW14 are the Sea-air CO2 flux calculated using the coefficients described in Takahashi et al. (2009) and Wanninkhof (2014), respectively).
EK60 Water Column Sonar Data Collected During SH1704
catalog.data.gov
gimi9.com
+1more
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2024). EK60 Water Column Sonar Data Collected During SH1704 [Dataset]. https://catalog.data.gov/dataset/ek60-water-column-sonar-data-collected-during-sh1704
Explore at:
Dataset updated
Nov 1, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Description
Survey the distributions and abundances of pelagic fish stocks, their prey, and their biotic and abiotic environments in the area of the California Current between San Francisco, California and San Diego, California. The following are specific objectives for the spring CalCOFI. I.D.1. Continuously sample pelagic fish eggs using the Continuous Underway Fish Egg Sampler (CUFES). The data will be used to estimate the distributions and abundances of spawning hake, anchovy, mackerel, and Pacific sardine. I.D.2. Continuously sample sea-surface temperature, salinity, and chlorophyll-a using a thermosalinometer and fluorometer. These data will be used to estimate the physical oceanographic habitats for target species. I.D.3. Continuously sample air temperature, barometric pressure, and wind speed and direction using an integrated weather station. I.D.4. Sample profiles of seawater temperature, salinity, chlorophyll-a, nutrients, and phytoplankton using a CTD with water-sampling rosette and other instruments at prescribed stations. Measurements of extracted chlorophyll and phaeophytin will be obtained with a fluorometer. Primary production will be measured as C14 uptake in a six hour in situ incubation. Nutrients will be measured with an auto-analyzer. These data will be used to estimate primary productivity and the biotic and abiotic habitats for target species. I.D.5. Sample the light intensity in the photic zone using a standard Secchi disk once per day in conjunction with a daytime CTD station. These data will be used to interpret the measurements of primary production. I.D.6. Sample plankton using a CalBOBL (CalCOFI Bongo Oblique) at prescribed stations. These data will be used to estimate the distributions and abundances of ichthyoplankton and zooplankton species. I.D.7. Sample plankton using a Manta (neuston) net at prescribed stations. These data will be used to estimate the distributions and abundances of ichthyoplankton species. I.D.8. Sample the vertically integrated abundance of fish eggs using a Pairovet net at prescribed stations. These data will be used to quantify the abundances and distributions of fish eggs. I.D.9. Sample plankton using a PRPOOS (Planktonic Rate Processes in Oligotrophic Ocean Systems net) at all prescribed CalCOFI stations on lines 90.0 and 80.0 as well as stations out to and including station 70.0 on lines 86.7 and 83.3 and station 81.8 46.9. PRPOOS will not be towed on SCCOOS stations. These data will be used in analyses by the LTER (Long Term Ecological Research) project. I.D.10. Continuously sample profiles of currents using the RDI/Teledyne Acoustic Doppler Current Profiler. I.D.11. Continuously observe, during daylight hours, seabirds and mammals. These data will be used to estimate the distributions and abundances of seabirds and marine mammals.
Badger
kaggle.com
zip
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terry Eppler (2025). Badger [Dataset]. https://www.kaggle.com/datasets/terryeppler/badger/discussion?sort=undefined
Explore at:
zip(325078128 bytes)Available download formats
Dataset updated
Mar 16, 2025
Authors
Terry Eppler
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data sources for Badger an open source budget execution & data analysis tool for federal budget analysts with the environmental protection agency based on WPF, Net 6, and is written in C#.

⚙️Features

Multiple data providers.

Datasets can be found on Kaggle

Charting and reporting.

Internal web browser, Baby, with queries optimized for searching .gov domains.

Pre-defined schema for more than 100 environmental data models.

Editors for SQLite, SQL Compact Edition, MS Access, SQL Server Express.

Excel-ish UI on top of a real databases.

Mapping for congressional earmark reporting and monitoring of pollution sites.

Financial data bound to environmental programs and statutory authority.

Ad-hoc calculations.

Add agency/region/division-specific branding.

The Winforms version of Badger is Sherpa

📦 Database Providers

Databases play a critical role in environmental data analysis by providing a structured system to store, organize, and efficiently retrieve large amounts of data, allowing analysts to easily access and manipulate information needed to extract meaningful insights through queries and analysis tools; essentially acting as the central repository for data used in data analysis processes. Badger provides the following providers to store and analyze data locally.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

SQL CE is a discontinued but still useful relational database produced by Microsoft for applications that run on mobile devices and desktops.

SQL Server Express Edition is a scaled down, free edition of SQL Server, which includes the core database engine.

MS Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools. more here

💻 System requirements

You need VC++ 2019 Runtime 32-bit and 64-bit versions

You will need .NET 8.

You need to install the version of VC++ Runtime that Baby Browser needs. Since we are using CefSharp 106, according to this we need the above versions

📚Documentation

Compilation Guide - instructions on how to compile Badger.

Configuration Guide - instructions on how to configure Badger.

Distribution Guide - distributing Badger

📝 Code

Controls - main UI layer with numerous controls and related functionality.

Styles - XAML-based styles for the Badger UI layer.

Enumerations - various enumerations used for budgetary accounting.

Extensions- useful extension methods for budget analysis by type.

Clients - other tools used and available.

Ninja - models used in EPA budget data analysis.

IO - input output classes used for networking and the file system.

Static - static types used in the analysis of environmental budget data.

Interfaces - abstractions used in the analysis of environmental budget data.

bin - Binaries are included in the bin folder due to the complex Baby setup required. Don't empty this folder.

Badger uses CefSharp 106 for Baby Browser and is built on NET 8

Badger supports x64 specific builds

bin/storage - HTML and JS required for downloads manager and custom error pages _

Dashboards

Environmental...
Data from: Strategies for mitigating greenhouse gas emissions from US dairy...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Aug 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Strategies for mitigating greenhouse gas emissions from US dairy farms toward a net zero goal [Dataset]. https://catalog.data.gov/dataset/data-from-strategies-for-mitigating-greenhouse-gas-emissions-from-us-dairy-farms-toward-a-
Explore at:
Dataset updated
Aug 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
United States
Description
Representative dairy farms in major dairy regions of the United States were modeled using the Integrated Farm System Model to quantify potential reductions in greenhouse gas emissions using various mitigation strategies. Important data and information describing these 14 farms are documented in this table. These data include the farm location, number of cows and heifers maintained, milk produced, feeds and nutrient contents fed, crop areas, crop yields, fertilizer and lime application rates, irrigation water applied, milking and housing facilities, manure collection, storage and application methods used, and soil characteristics. Simulated output information for feed consumption, nutrient losses, fossil energy use, water use, and greenhouse gas emissions are listed for each farm. These data are published as supplementary information for the article “Strategies for mitigating greenhouse gas emissions from US dairy farms toward a net zero goal” published in the Journal of Dairy Science.
a
Respiration_chambers/raw_log_files and combined datasets of biomass and...
data.aad.gov.au
researchdata.edu.au
Updated Dec 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLACK, JAMES (2018). Respiration_chambers/raw_log_files and combined datasets of biomass and chamber data, and physical parameters [Dataset]. http://doi.org/10.26179/5c1827d5d6711
Explore at:
Unique identifier
https://doi.org/10.26179/5c1827d5d6711
Dataset updated
Dec 18, 2018
Dataset provided by
Australian Antarctic Data Centre
Authors
BLACK, JAMES
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 27, 2015 - Feb 23, 2015
Area covered
Description
General overview The following datasets are described by this metadata record, and are available for download from the provided URL.

Raw log files, physical parameters raw log files

Raw excel files, respiration/PAM chamber raw excel spreadsheets

Processed and cleaned excel files, respiration chamber biomass data

Raw rapid light curve excel files (this is duplicated from Raw log files), combined dataset pH, temperature, oxygen, salinity, velocity for experiment

Associated R script file for pump cycles of respirations chambers

####

Physical parameters raw log files

Raw log files 1) DATE= 2) Time= UTC+11 3) PROG=Automated program to control sensors and collect data 4) BAT=Amount of battery remaining 5) STEP=check aquation manual 6) SPIES=check aquation manual 7) PAR=Photoactive radiation 8) Levels=check aquation manual 9) Pumps= program for pumps 10) WQM=check aquation manual

####

Respiration/PAM chamber raw excel spreadsheets

Abbreviations in headers of datasets Note: Two data sets are provided in different formats. Raw and cleaned (adj). These are the same data with the PAR column moved over to PAR.all for analysis. All headers are the same. The cleaned (adj) dataframe will work with the R syntax below, alternative add code to do cleaning in R.

Date: ISO 1986 - Check Time:UTC+11 unless otherwise stated DATETIME: UTC+11 unless otherwise stated ID (of instrument in respiration chambers) ID43=Pulse amplitude fluoresence measurement of control ID44=Pulse amplitude fluoresence measurement of acidified chamber ID=1 Dissolved oxygen ID=2 Dissolved oxygen ID3= PAR ID4= PAR PAR=Photo active radiation umols F0=minimal florescence from PAM Fm=Maximum fluorescence from PAM Yield=(F0 – Fm)/Fm rChl=an estimate of chlorophyll (Note this is uncalibrated and is an estimate only) Temp=Temperature degrees C PAR=Photo active radiation PAR2= Photo active radiation2 DO=Dissolved oxygen %Sat= Saturation of dissolved oxygen Notes=This is the program of the underwater submersible logger with the following abreviations: Notes-1) PAM= Notes-2) PAM=Gain level set (see aquation manual for more detail) Notes-3) Acclimatisation= Program of slowly introducing treatment water into chamber Notes-4) Shutter start up 2 sensors+sample…= Shutter PAMs automatic set up procedure (see aquation manual) Notes-5) Yield step 2=PAM yield measurement and calculation of control Notes-6) Yield step 5= PAM yield measurement and calculation of acidified Notes-7) Abatus respiration DO and PAR step 1= Program to measure dissolved oxygen and PAR (see aquation manual). Steps 1-4 are different stages of this program including pump cycles, DO and PAR measurements.

8) Rapid light curve data Pre LC: A yield measurement prior to the following measurement After 10.0 sec at 0.5% to 8%: Level of each of the 8 steps of the rapid light curve Odessey PAR (only in some deployments): An extra measure of PAR (umols) using an Odessey data logger Dataflow PAR: An extra measure of PAR (umols) using a Dataflow sensor. PAM PAR: This is copied from the PAR or PAR2 column PAR all: This is the complete PAR file and should be used Deployment: Identifying which deployment the data came from

####

Respiration chamber biomass data

The data is chlorophyll a biomass from cores from the respiration chambers. The headers are: Depth (mm) Treat (Acidified or control) Chl a (pigment and indicator of biomass) Core (5 cores were collected from each chamber, three were analysed for chl a), these are psudoreplicates/subsamples from the chambers and should not be treated as replicates.

####

Associated R script file for pump cycles of respirations chambers

Associated respiration chamber data to determine the times when respiration chamber pumps delivered treatment water to chambers. Determined from Aquation log files (see associated files). Use the chamber cut times to determine net production rates. Note: Users need to avoid the times when the respiration chambers are delivering water as this will give incorrect results. The headers that get used in the attached/associated R file are start regression and end regression. The remaining headers are not used unless called for in the associated R script. The last columns of these datasets (intercept, ElapsedTimeMincoef) are determined from the linear regressions described below.

To determine the rate of change of net production, coefficients of the regression of oxygen consumption in discrete 180 minute data blocks were determined. R squared values for fitted regressions of these coefficients were consistently high (greater than 0.9). We make two assumptions with calculation of net production rates: the first is that heterotrophic community members do not change their metabolism under OA; and the second is that the heterotrophic communities are similar between treatments.

####

Combined dataset pH, temperature, oxygen, salinity, velocity for experiment

This data is rapid light curve data generat...
Taipei YouBike 2.0 Rental Records
kaggle.com
zip
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Chen ( •̀ ω •́ )y (2025). Taipei YouBike 2.0 Rental Records [Dataset]. https://www.kaggle.com/datasets/kevinchen6416841614/taipei-youbike-2-0-rental-records/discussion
Explore at:
zip(1867476651 bytes)Available download formats
Dataset updated
Nov 1, 2025
Authors
Kevin Chen ( •̀ ω •́ )y
Area covered
Taipei City
Description
This data card was written by chat-GPT Some part of the data card are not ready yet

File description

Taipei YouBike 2.0 rental records.csv

Source: https://data.gov.tw/dataset/150635

Language: Chinese (Traditional)

Provider: Taipei City Department of Transportation (臺北市政府交通局)

Time Range: April 2022 – March 2025

Format: ZIP archives containing monthly CSV files (some files may be nested within folders) ### Overview This dataset contains rental records from Taipei City’s Public Bicycle 2.0 system. Each ZIP file corresponds to a specific month and includes detailed transaction data for bicycle rentals and returns across the city. The dataset is published by the Taipei City Department of Transportation through Taiwan’s open data platform.

Although the dataset is publicly available, it is inconsistently maintained — file structures and column definitions vary between months. For analysis, all files have been downloaded and extracted into a single folder named “unzipped”, preserving their raw form.

File Structure and Contents

Each CSV file typically contains the following columns (column names and availability may differ by month): - rent_time: [description placeholder] - rent_station: [description placeholder] - return_time: [description placeholder] - return_station: [description placeholder] - rent: [description placeholder] - type: [description placeholder — introduced around October 2024 to indicate bicycle type] - infodate: [description placeholder]

real-time station info.json

YouBike 2.0 Taipei City Real-Time Information - Source: https://data.nat.gov.tw/dataset/147580 - Provider: 臺北市政府交通局 (Taipei City Department of Transportation) - Data URL (updated every minute): https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json - Language: Chinese (Traditional) - Format: JSON

Overview

This dataset provides real-time information on the YouBike 2.0 public bicycle system in Taipei City. The data include the location, capacity, and availability of bicycles and parking spaces at each station. Updates occur approximately every minute (but not in this dataset), and the dataset is published by the Taipei City Department of Transportation through the National Open Data Platform.

The dataset used in this project was collected for the purpose of obtaining station coordinates (latitude and longitude) and other static station information. These geographic data can be used to visualize the distribution of YouBike stations across Taipei City.

If users require up-to-date or continuously refreshed information, it is recommended to directly access the real-time JSON feed via the provided URL above.

Data Fields

Each record represents a single YouBike 2.0 station and includes the following fields: - sno (站點代號 / Station ID): [description placeholder] - sna (場站中文名稱 / Station Name - Chinese): [description placeholder] - quantity (場站總停車格 / Total Parking Spaces): [description placeholder] - available_rent_bikes (場站目前車輛數量 / Available Bicycles for Rent): [description placeholder] - sarea (場站區域 / Administrative Area - Chinese): [description placeholder] - mday (資料更新時間 / Record Update Time): [description placeholder] - latitude (緯度 / Latitude): [description placeholder] - longitude (經度 / Longitude): [description placeholder] - ar (地點 / Address - Chinese): [description placeholder] - sareaen (場站區域英文 / Administrative Area - English): [description placeholder] - snaen (場站名稱英文 / Station Name - English): [description placeholder] - aren (地址英文 / Address - English): [description placeholder] - available_return_bikes (空位數量 / Available Parking Spaces): [description placeholder] - act (全站禁用狀態 / Station Active Status): [description placeholder] - srcUpdateTime (YouBike2.0系統發布資料更新的時間 / Source Update Time from YouBike System): [description placeholder] - updateTime (大數據平台經過處理後將資料存入DB的時間 / Time When Data Were Processed and Stored in Database): [description placeholder] - infoTime (各場站來源資料更新時間 / Station Data Update Time): [description placeholder] - infoDate (各場站來源資料更新日期 / Station Data Update Date): [description placeholder]

License/Attribution

File 1

(Chinese) - 臺北市政府交通局；2025；臺北市公共自行車2.0租借紀錄 - 此開放資料依政府資料開放授權條款 (Open Government Data License) 進行公眾釋出，使用者於遵守本條款各項規定之前提下，得利用之。 - 政府資料開放授權條款：https://data.gov.tw/license

(English) - Department of Transportation, Taipei City Government (DOT); 2025; 臺北市公共自行車2.0租借紀錄 - The Open Data is made available to the public under the Open Government Data License, User can make use of it when complying to the condition and obligation of its terms. - Open Government Data License:https://data.gov.tw/license

File 2

(Chinese) - 臺北市政府交通局；2025；YouBike2.0臺北市公共自行車即時資訊 - 此開放資料依政府資料開放授權條款 (Open Government Data License) 進行公眾釋出，使用者於遵守本條款各項規定之前提下，得利用之。 - 政府資料開放授權條款：https://data.gov.tw/license

(English) - Department of Transportation, Taipei City Government...
ME70 Water Column Sonar Data Collected During SH1306
catalog.data.gov
gimi9.com
Updated Sep 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2021). ME70 Water Column Sonar Data Collected During SH1306 [Dataset]. https://catalog.data.gov/dataset/me70-water-column-sonar-data-collected-during-sh1306
Explore at:
Dataset updated
Sep 17, 2021
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Description
Summer 2013 CCE. This objective was accomplished using the following equipment and protocols: â€¢ The primary goal of the survey is to estimate the biomasses, distributions, and biological compositions of Pacific hake and Pacific sardine populations using data from an integrated acoustic and trawl survey off the west coast of the U.S. and Canada from approximately San Diego, California (lat 32Â°48.0174â€™N) to the north end of Vancouver Island, Canada (lat 50Â°45.65â€™N). â€¢ Continuously sample multi-frequency acoustic backscatter data using the shipâ€™s Simrad EK60 scientific echo sounder system. These data will be used to estimate the distributions and abundances of hake and sardine. â€¢ Conduct daytime trawling to classify observed backscatter layers to species and size composition and to collect specimens of hake and other organisms. â€¢ Conduct nighttime (i.e., between sunset and sunrise) surface trawling to collect specimens of coastal pelagic fishes (CPS) and other organisms. These data will be used to classify observed backscatter to species and their size distributions. Nighttime sampling operations will conclude in time for the ship to resume running east-west acoustic transects by sunrise. â€¢ Image fish using a portable X-radiograph machine for the purpose of target strength modeling and estimation. â€¢ Collect a variety of other acoustic, biological, and oceanographic samples relevant to hake and sardine distributions. These data are vital for the surveys and assessments of hake and sardine. 3 â€¢ Continuously sample sea-surface temperature, salinity, and chlorophyll a using the shipâ€™s thermosalinograph and fluorometer. These data will be used to estimate the physical oceanographic habitats for each target species. â€¢ Continuously sample air temperature, barometric pressure, and wind speed and direction using the shipâ€™s integrated weather station. â€¢ Continuously sample pelagic fish eggs using the Continuous Underway Fish Egg Sampler (CUFES). The data will be used to estimate the distributions and abundances of spawning hake, anchovy, mackerel, and sardine. â€¢ Sample profiles of temperature and salinity using either an underway conductivity-temperature-depth (CTD) system during the day or a standard CTD system with water-sampling rosette and other instruments at nighttime stations, as time allows. â€¢ Sample plankton using a CalBOBL (CalCOFI Bongo) net at nighttime stations, as time allows. These data will be used to estimate the distribution and abundance of ichthyoplankton and zooplankton species. â€¢ Continuously sample multi-frequency acoustic backscatter data using the shipâ€™s Simrad ME70 multibeam echosounder system, synchronized and configured to not interfere with the EK60s. â€¢ Optically verify CPS backscatter while underway conducting acoustic transects, using a towed stereo camera system. â€¢ Optically observe fish behavior inside nighttime trawls using cameras and lights mounted inside the net.
d
Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales...
datarade.ai
.json, .csv, .xls
Updated Jul 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salutary Data (2023). Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales Intelligence, and Sales Outreach [Dataset]. https://datarade.ai/data-products/salutary-data-prospect-data-62m-us-contacts-for-b2b-sale-salutary-data
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Jul 15, 2023
Dataset authored and provided by
Salutary Data
Area covered
United States of America
Description
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4MM+ companies, and is updated regularly to ensure we have the most up-to-date information.

We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

Products: API Suite Web UI Full and Custom Data Feeds

Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Kinect V2 Multi Objects with 3D Positions
kaggle.com
zip
Updated Nov 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Fawzy Elaraby (2019). Kinect V2 Multi Objects with 3D Positions [Dataset]. https://www.kaggle.com/ahmedfawzyelaraby/kinect-v2-multi-objects-with-3d-positions
Explore at:
zip(4177276779 bytes)Available download formats
Dataset updated
Nov 30, 2019
Authors
Ahmed Fawzy Elaraby
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3283916%2F6e74c12acafc8e9015fe587fb7537efd%2Fdataset.png?generation=1575311299667926&alt=media" alt="">

Introduction

Microsoft Kinect is a motion sensing device, that was invented by Microsoft, is mainlyused for joystick-free games with Microsoft Xbox gaming console [1]. Microsoft Kinecthas two versions, Kinect 360 which was released in 2010, shown in Figure 1.4, and KinectOnewhich was released in 2013, shown in Figure below [2].

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3283916%2Fb14d439b6b99f7956c460357716a2c69%2Fkinect.png?generation=1575312218544019&alt=media" alt="">

This dataset is collected in the form of Robotics Operating System (ROS) bags that contain both RGB and depth images collected from the Kinect V2 sensor. This dataset is collected as a part of the research done in 3D Object Detection and Classification Using Microsoft Kinect and Deep Neural Networks master's degree thesis. This thesis is done as part of the master's degree program at Cairo University.

Note:

You can access all the source code released in this thesis using the following links:

System Code.

YOLO Wrapper Code.

Live View Code.

Content

This dataset contains three data files in the form of ROS bag. These data files contains 6 types of objects, where 5 of them are stationary and the other is moving between different positions along the scene. The 6 object types are:

Human (moving),

Chair (stationary),

Sofa (stationary),

TV Monitor (stationary),

Bottle (stationary),

Books (stationary).

Each of these objects can be found at certain depth position which are as follows:

Small TV: 2.43 m,

Large TV: 4.43 m,

Black Chair: 1.81 m,

White Chair (to the right): 2.14 m,

Sofa: 1.4 m,

Books: 1.96 m,

Bottle: 1.71 m,

Human Position #1: 1.05 m,

Human Position #2: 2.66 m,

Human Position #3: 3.81 m,

Human Position #4: 4.54 m.

Darknet is used in the system introduced in this thesis work to get the bounding box of each of these objects.

References

[1] A. S. Sabale and Y. M. Vaidya, “Accuracy measurement of depth using kinect sensor,”in2016 Conference on Advances in Signal Processing (CASP), pp. 155–159, June2016.

[2] “Kinect Sensor.”https://en.wikipedia.org/wiki/Kinect, 2012. [Online;accessed 28-August-2018].
Bitcoin Historical OnChain Data
kaggle.com
zip
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mustafa er (2023). Bitcoin Historical OnChain Data [Dataset]. https://www.kaggle.com/datasets/aski1140/intotheblock-bitcoin-onchain-data
Explore at:
zip(440923 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
mustafa er
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset include some onchain indicator information at daily basis for Bitcoin. This informations:

exchange onchain market depth

inflow transaction count

inflow volume

netflows

open interest to market cap ratio

outflow transaction count

outflow volume

telegram sentiment

total flows

trades per side

twitter sentiment

Inflow Volume

IntoTheBlock has built a proprietary machine learning powered classifier to identify addresses of top centralized exchanges, including their deposit addresses, withdrawal addresses, hot wallets and cold wallets. With this classifier, IntoTheBlock can measure the total amount of a given crypto-asset flowing into exchanges and measures this in dollar and crypto terms. The result is the Inflow Volume indicator.

Outlow Volume

While Inflow Volume at times anticipate volatility, Outflow Volume is often more reactive. In other words, Outflow Volume often spikes following either a crash or a significant break-out as shown in the example above. This could potentially be interpreted as users going long and opting to hold their crypto outside centralized exchanges.

Total Flows IntoTheBlock uses machine learning algorithms to identify centralized exchanges’ deposit and withdrawal addresses. Through this process, IntoTheBlock measures the total activity flowing in and out of centralized exchanges. The result is the Total Flows indicator which is measured the following way

Total Flows = Inflow Volume + Outlow Volume

Net Flows

The Net Flows indicator highlights trends of traders sending money in and out of exchanges. Recall that Net Flows are positive when more funds are entering than leaving exchanges. Therefore, we observe that positive Net Flows tend to coincide with periods following large increases in price (like LINK when it tripled between April and July) or confirmation of down-trends (as seen with LINK in late August).

Conversely, Net Flows are negative when a greater volume is being withdrawn from exchanges. This could be seen as a sign of accumulation (LINK in early August) or addresses buying back following large declines (LINK in early September).

While Net Flows also affect large cap crypto-assets, smaller cap tokens are more susceptible to large changes in prices deriving from exchange flows. This is simply a result of smaller caps requiring less capital in order to make market-moving trades. This is worth considering when using the Net Flows indicator to trade.

Net Flows = Inflow Volume - Outflow Volume

Outflow Transaction Count

The Outflow Transaction Count indicator provides indication of users withdrawing their funds from centralized exchanges likely to store in safer cold wallets. This is a valuable approximation of users going long and opting to hold their own funds. For this reason, outflows tend to spike as price crashes as pointed in the example above. While this can be the case on several occasions, natural fluctuations in exchanges’ flows can often have smaller spikes without regards to price action as well.

Inflow Transaction Count

As the name suggests, the Inflow Transaction Count indicator provides the number of incoming crypto transactions entering exchanges. While the Inflow Volume measures the aggregate dollar amount, which is influenced by whales’ transactions, the Inflow Transaction Count is a better approximation of the number of users sending funds into exchanges.

This indicator has also shown to rise along and anticipate periods of high volatility. For example, on September 1st, inflow transactions for Bitcoin hit a 3-month high preceding a decrease in price of 14% over the following 48 hours. While this pattern does tend to emerge, natural fluctuations in inflow transactions can also increase at times.
Multi-aspect Integrated Migration Indicators (MIMI) dataset
data.europa.eu
unknown
Updated Mar 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Multi-aspect Integrated Migration Indicators (MIMI) dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6360651?locale=fr
Explore at:
unknown(63334098)Available download formats
Dataset updated
Mar 14, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multi-aspect Integrated Migration Indicators (MIMI) dataset is the result of the process of gathering, embedding and combining traditional migration datasets, mostly from sources like Eurostat and UNSD Demographic Statistics Database, and alternative types of data, which consists in multidisciplinary features and measures not typically employed in migration studies, such as the Facebook Social Connectedness Index (SCI). Its purpose is to exploit these novel types of data for: nowcasting migration flows and stocks, studying integration of multiple sources and knowledge, and investigating migration drivers. The MIMI dataset is designed to have a unique pair of countries for each row. Each record contains country-to-country information about: migrations flows and stock their share, their strength of Facebook connectedness and other features, such as corresponding populations, GDP, coordinates, NET migration, and many others. Methodology. After having collected bilateral flows records about international human mobility by citizenship, residence and country of birth (available for both sexes and, in some cases, for different age groups), they have been merged together in order to obtain a unique dataset in which each ordered couple (country-of-origin, country-of-destination) appears once. To avoid duplicate couples, flow records have been selected by following this priority: first migration by citizenship, then migration by residence and lastly by country of birth. The integration process started by choosing, collecting and meaningfully including many other indicators that could be helpful for the dataset final purpose mentioned above. International migration stocks (having a five-year range of measurement) for each couple of countries. Geographical features for each country: ISO3166 name and official name, ISO3166-1 alpha-2 and alpha-3 codes, continent code and name of belonging, latitude and longitude of the centroid, list of bordering countries, country area in square kilometres. Also, the following features have been included for each pair of countries: geodesic distance (in kilometres) computed between their respective centroids. Non-bidirectional migration measures for each country: total number of immigrants and emigrants for each year, NET migration and NET migration rate in a five-year range. Other multidisciplinary indicators (cultural, social, anthropological, demographical, historical features) related to each country: religion (single one or list), yearly GDP at PPP, spoken language (or list of languages), yearly population stocks (and population densities if available), number of Facebook users, percentage of Facebook users, cultural indicators (PDI, IDV, MAS, UAI, LTO). Also the following feature have been included for each pair of countries: Facebook Social Connectedness Index. Once traditional and non-traditional knowledge is gathered and integrated, we move to the pre-processing phase where we manage the data cleaning, preparation and transformation. Here our dataset was subjected to various computational standard processes and additionally reshaped in the final structure established by our design choices. The data quality assessment phase was one of the longest and most delicate, since many values were missing and this could have had a negative impact on the quality of the desired resulting knowledge. They have been integrated from additional sources such as The World Bank, World Population Review, Statista, DataHub, Wikipedia and in some cases extracted from Python libraries such as PyPopulation, CountryInfo and PyCountry. The final dataset has the structure of a huge matrix having countries couples as index (uniquely identified by coupling their ISO 3166-1 alpha-2 codes): it comprises 28725 entries and 485 columns.
Data from: Deep learning four decades of human migration: datasets
zenodo.org
csv, nc
Updated Oct 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Gaskin; Thomas Gaskin; Guy Abel; Guy Abel (2025). Deep learning four decades of human migration: datasets [Dataset]. http://doi.org/10.5281/zenodo.17344747
Explore at:
csv, ncAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17344747
Dataset updated
Oct 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thomas Gaskin; Thomas Gaskin; Guy Abel; Guy Abel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Zenodo repository contains all migration flow estimates associated with the paper "Deep learning four decades of human migration." Evaluation code, training data, trained neural networks, and smaller flow datasets are available in the main GitHub repository, which also provides detailed instructions on data sourcing. Due to file size limits, the larger datasets are archived here.

Data is available in both NetCDF (.nc) and CSV (.csv) formats. The NetCDF format is more compact and pre-indexed, making it suitable for large files. In Python, datasets can be opened as xarray.Dataset objects, enabling coordinate-based data selection.

Each dataset uses the following coordinate conventions:

Year: 1990–2023

Birth ISO: Country of birth (UN ISO3)

Origin ISO: Country of origin (UN ISO3)

Destination ISO: Destination country (UN ISO3)

Country ISO: Used for net migration data (UN ISO3)

The following data files are provided:

T.nc: Full table of flows disaggregated by country of birth. Dimensions: Year, Birth ISO, Origin ISO, Destination ISO

flows.nc: Total origin-destination flows (equivalent to T summed over Birth ISO). Dimensions: Year, Origin ISO, Destination ISO

net_migration.nc: Net migration data by country. Dimensions: Year, Country ISO

stocks.nc: Stock estimates for each country pair. Dimensions: Year, Origin ISO (corresponding to Birth ISO), Destination ISO

test_flows.nc: Flow estimates on a randomly selected set of test edges, used for model validation

Additionally, two CSV files are provided for convenience:

mig_unilateral.csv: Unilateral migration estimates per country, comprising:

imm: Total immigration flows

emi: Total emigration flows

net: Net migration

imm_pop: Total immigrant population (non-native-born)

emi_pop: Total emigrant population (living abroad)

mig_bilateral.csv: Bilateral flow data, comprising:

mig_prev: Total origin-destination flows

mig_brth: Total birth-destination flows, where Origin ISO reflects place of birth

Each dataset includes a mean variable (mean estimate) and a std variable (standard deviation of the estimate).

An ISO3 conversion table is also provided.
Z
Vision-Based Obstacle Detection on Rail Tracks
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated May 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
De Donato, Lorenzo; Vittorini, Valeria; Flammini, Francesco; Marrone, Stefano (2023). Vision-Based Obstacle Detection on Rail Tracks [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7924874
Explore at:
Dataset updated
May 18, 2023
Dataset provided by
Linnaeus University, Växjö, Sweden
University of Naples Federico II, Naples, Italy
Authors
De Donato, Lorenzo; Vittorini, Valeria; Flammini, Francesco; Marrone, Stefano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Acknowledgement and Disclaimers

These data are a product of a research activity conducted in the context of the RAILS (Roadmaps for AI integration in the raiL Sector) project. RAILS has received funding from the Shift2Rail Joint Undertaking (JU) under the European Union’s Horizon 2020 research and innovation programme under grant agreement n. 881782 Rails. The JU receives support from the European Union’s Horizon 2020 research and innovation program and the Shift2Rail JU members other than the Union.

The information and views set out in this description are those of the author(s) and do not necessarily reflect the official opinion of Shift2Rail Joint Undertaking. The JU does not guarantee the accuracy of the data included in this dataset. Neither the JU nor any person acting on the JU’s behalf may be held responsible for the use which may be made of the information contained therein.

This "dataset" has been created for scientific purposes only to study the potentials of Deep Learning (DL) approaches when used to analyse Video Data in order to detect possible obstacles on rail tracks and thus avoid collisions. The authors DO NOT ASSUME any responsibility for the use that other researchers or users will make of these data.

Objectives of the Study

RAILS defined some pilot case studies to develop Proofs-of-Concept (PoCs), which are conceived as benchmarks, with the aim of providing insight towards the definition of technology roadmaps that could support future research and/or the deployment of AI applications in the rail sector. In this context, the main objectives of the specific PoC "Vision-Based Obstacle Detection on Rail Tracks" were to investigate: i) solutions for the generation of synthetic data, suitable for the training of DL models; and ii) the potential of DL applications when it comes to detecting any kind of obstacles on rail tracks while exploiting video data from a single RGB camera.

A Brief Overview of the Approach

A multi-modular approach has been proposed to achieve the objectives mentioned above. The resulting architecture includes the following modules:

The Rails Detection Module (RDM) detects rail tracks. The output of the RDM is used by the ODM and ADM.

The Object Detection Module (ODM) detects obstacles whose type is known in advance.

The Anomaly Detection Module (ADM) identifies any possible anomaly on rail tracks. These include obstacles whose type is not known in advance.

The Obstacle Detection Module merges the outputs from the ODM and the ADM.

The Distance Estimation Module estimates the distance of objects and anomalies from the train.

The research was specifically oriented at implementing the RDM-ADM pipeline. Indeed, the object detection approaches that would be used to implement the ODM have been widely investigated by the research community, instead, to the best of our knowledge, limited work has been done in the rails field in the context of anomaly detection. The RDM has been realised by adopting a Semantic Segmentation approach based on U-Net; while, to develop the ADM, a Vector-Quantized Variational Autoencoder trained in Unsupervised mode was leveraged. Further details can be found in the RAILS "Deliverable D2.3: WP2 Report on experimentation, analysis, and discussion of results".

Steps to implement the RDM-ADM pipeline and description of shared Data

The following list reports all the steps that have been performed to implement the RDM-ADM pipeline; the words in bold-italic refer to the files that are shared within this dataset:

A Railway Scenario was generated in MathWorks' RoadRunner.

A video (FreeTrackVideo) was recorded by simulating an RGB camera mounted in front of the train; no obstacles on rail tracks were considered in this phase.

2000 frames (FreeTrack2KFrames) were extracted from the aforementioned video. The video contains 4143 frames, however, only 2000 (each other frame starting from the first one) were taken into account due to training time and GPU RAM constraints.

Only 10% of the 2000 frames were manually labelled (i.e., 200 frames, a frame every 10 frames) by exploiting LabelMe; these frames were then subdivided into training and validation sets (InitialLabelledSet).

Hence, a Semi-Automatic labelling algorithm was developed by leveraging self-training and transfer learning. This algorithm made it possible to label all the FreeTrack2KFrames starting from the InitialLabelledSet. The resulting labels can be found in FreeTrack2KLabels.

Data Augmentation was then performed in order to introduce some aleatory in the dataset. Because of the same time and RAM constraints mentioned above, the FreeTrack2KFrames set of data was reduced further: 1600 frames were selected among the aforementioned 2000 and then 5 transformations (Bright, Dark, Rain, Shadow, and Sun Flare) were applied to obtain the dataset (FreeTrack16TrainSet, FreeTrack16ValSet, FreeTrack16TestSet) that was used to train, validate, and test the RDM.

Once the RDM was trained, the FreeTrackVideo was processed to obtain the masked frames that were then used to build the dataset(s) to train, validate, and test the ADM. The ADM was studied by considering two different datasets: the Non-Anomaly Dataset (NAD), which basically contains all the frames of the FreeTrackVideo once processed by the RDM; and the Augmented Non-Anomaly Dataset (A-NAD), which contains 9000 frames, 1500 of which were extracted from the NAD, while the remaining 7500 were obtained by applying the same transformations mentioned above.

Lastly, when both the RDM and the ADM were trained, the performances of the whole RDM-ADM pipeline were tested on the WithCarVideo which depicts the same scenario as the FreeTrackVideo but it also depicts a car laying on the rail tracks (i.e., an obstacle).

Facebook

Twitter

Click to copy link

Link copied

Cite

Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100

original : CIFAR 100

Explore at:

141 scholarly articles cite this dataset (View in Google Scholar)

zip(168517945 bytes)Available download formats

Dataset updated

Dec 28, 2024

Authors

Shashwat Pandey

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...

Clear search

Close search

Google apps

Main menu

original : CIFAR 100

Data from: net data

Spatiotemporal database in above- and belowground net primary production...

Extended datasets from MM-IMDB and Ads-Parallelity dataset with the features...

Microsoft Azure Predictive Maintenance

Context

Details

Acknowledgements

Inspiration

EK80 Water Column Sonar Data Collected During SH2204

Reynolds-Smith V2 global monthly average sea surface temperatures

Continuous pCO2 and Sea-Air CO2 Net Fluxes from the Following Ocean Rings in...

EK60 Water Column Sonar Data Collected During SH1704

Badger

⚙️Features

📦 Database Providers

💻 System requirements

📚Documentation

📝 Code

Dashboards

Data from: Strategies for mitigating greenhouse gas emissions from US dairy...

Respiration_chambers/raw_log_files and combined datasets of biomass and...

Taipei YouBike 2.0 Rental Records

File description

Taipei YouBike 2.0 rental records.csv

File Structure and Contents

real-time station info.json

Overview

Data Fields

License/Attribution

File 1

File 2

ME70 Water Column Sonar Data Collected During SH1306

Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales...

Kinect V2 Multi Objects with 3D Positions

Introduction

Note:

Content

References

Bitcoin Historical OnChain Data

Multi-aspect Integrated Migration Indicators (MIMI) dataset

Data from: Deep learning four decades of human migration: datasets

Vision-Based Obstacle Detection on Rail Tracks

original : CIFAR 100