6 datasets found

Food Recognition 2022
kaggle.com
Updated Feb 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sai Nikhilesh Reddy (2022). Food Recognition 2022 [Dataset]. https://www.kaggle.com/datasets/sainikhileshreddy/food-recognition-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sai Nikhilesh Reddy
Description
Food Recognition Benchmark 2022 😋

This dataset is Preprocessed⚙️, Compressed🗜️, and Streamable📶!

Problem Statement

The goal of this benchmark is to train models which can look at images of food items and detect the individual food items present in them. We use a novel dataset of food images collected through the MyFoodRepo app, where numerous volunteer Swiss users provide images of their daily food intake in the context of a digital cohort called Food & You. This growing data set has been annotated - or automatic annotations have been verified - with respect to segmentation, classification (mapping the individual food items onto an ontology of Swiss Food items), and weight/volume estimation.

Datasets

Finding annotated food images is difficult. There are some databases with some annotations, but they tend to be limited in important ways. To put it bluntly: most food images on the internet are a lie. Search for any dish, and you’ll find beautiful stock photography of that particular dish. Same on social media: we share photos of dishes with our friends when the image is exceptionally beautiful. But algorithms need to work on real-world images. In addition, annotations are generally missing - ideally, food images would be annotated with proper segmentation, classification, and volume/weight estimates. With this 2022 iteration of the Food Recognition Benchmark, AIcrowd released v2.0 of the MyFoodRepo dataset, containing a training set of 39,962 images food items, with 76,491 annotations.

Zipped Datasets is in MS-COCO format:

raw_data/public_training_set_release_2.0.tar.gz: Training Set -> 39,962 (as RGB images) food images -> 76491 annotations -> 498 food classes raw_data/public_validation_set_2.0.tar.gz: Validation Set -> 1000 (as RGB images) food images -> 1830 annotations -> 498 food classes raw_data/public_test_release_2.0.tar.gz: Public Test Set -> Food Recognition Benchmark 2022

Check the usage at the notebook

Kaggle Notebook - https://www.kaggle.com/sainikhileshreddy/how-to-use-the-dataset

Usage of the processed kaggle dataset

import hub ds = hub.dataset('/kaggle/input/food-recognition-2022/hub/train/')

Usage of the dataset anywhere (through streaming)

import hub ds = hub.dataset('hub://sainikhileshreddy/food-recognition-2022-train/')

Usage of the hub dataset using popular deep learning frameworks

1. Food Recognition 2020 with PyTorch in Python

dataloader = ds.pytorch(num_workers = 2, shuffle = True, transform = transform, batch_size= batch_size)

2. Food Recognition 2020 with TensorFlow in Python

ds_tensorflow = ds.tensorflow()

Evaluation

The benchmark uses the official detection evaluation metrics used by COCO. The primary evaluation metric is AP @ IoU=0.50:0.05:0.95. The seconday evaluation metric is AR @ IoU=0.50:0.05:0.95. A further discussion about the evaluation metric can be found here.

Dataset Original Source

Dataset has been taken from the Food Recognition Benchmark 2022. You can find more details about the challenge on the below link https://www.aicrowd.com/challenges/food-recognition-benchmark-2022

Resources

Activeloop Hub: https://docs.activeloop.ai/

Github: SaiNikhileshReddy | Food-Recognition-2022

Kaggle Discussion - What is Activeloop Hub Format?
A
‘COVID-19 dataset in Japan’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/beaf3665/?iid=011-308&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

--- Dataset description provided by original source is as follows ---

1. Context

This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

This dataset can be retrieved with CovsirPhy (Python library).

pip install covsirphy --upgrade

import covsirphy as cs data_loader = cs.DataLoader() japan_data = data_loader.japan() # The number of cases (Total/each province) clean_df = japan_data.cleaned() # Metadata meta_df = japan_data.meta()

Please refer to CovsirPhy Documentation: Japan-specific dataset.

Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

1.1 Total number of cases in Japan

covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

Manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸（新型コロナワクチン情報）

1.2 The number of cases at prefecture level

covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

1.3 Metadata of each prefecture

covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省厚生統計要覧（2017年度）第１－５表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

Hospital_bed: With the primary data of 厚生労働省感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省第二種感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省医療施設動態調査（令和２年１月末概数）, 厚生労働省感染症指定医療機関について and secondary data of COVID-19 Japan 都道府県別感染症病床数,

Specific: Hospital beds of medical institutions designated for specific infectious diseases

Type-I: Hospital beds of medical institutions designated for type I infectious diseases

Type-II: Hospital beds of medical institutions designated for type II infectious diseases

Tuberculosis: Hospital beds of medical institutions designated for tuberculosis (outpatient care)

Care: long term care bed of hospitals

Total: Beds of all hospitals

Clinic_bed: With the primary data of 医療施設動態調査（令和２年１月末概数） ,

Care: long term care beds of clinics

Total: Beds of all clinics

Location: Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Latitude

Longitude

Admin

Capital: Prefectural capital city. Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Region: Region name. Data is from WIkipedia (secondary data). "Kyushu-Okinawa region" was separated to "Kyushu" and "Okinawa" by this datasets' author.

Num: Prefecture code (JIS X 0401: Hokkaido=1,...Okinawa=47). Data is from 国土交通省 GIS HP Pref code. cf. (not source) Japan VIsitor: Japan Prefectures Map.

2. Acknowledgements

To create this dataset, edited and transformed data of the following sites was used.

厚生労働省 Ministry of Health, Labour and Welfare, Japan:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別感染症病床数 (CC BY)

Wikipedia: Wikipedia

LinkData: LinkData (Public Domain)

Inspiration

Changes in number of cases over time

Percentage of patients without symptoms / mild or severe symptoms

What to do next to prevent outbreak

License and how to cite

Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

--- Original source retains full ownership of the source dataset ---
Data and code for training and testing a ResMLP model with experience replay...
zenodo.org
zip
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianda Chen; Jianda Chen; Minghua Zhang; Wuyin Lin; Tao Zhang; Wei Xue; Minghua Zhang; Wuyin Lin; Tao Zhang; Wei Xue (2025). Data and code for training and testing a ResMLP model with experience replay for machine-learning physics parameterization [Dataset]. http://doi.org/10.5281/zenodo.13690812
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13690812
Dataset updated
Feb 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jianda Chen; Jianda Chen; Minghua Zhang; Wuyin Lin; Tao Zhang; Wei Xue; Minghua Zhang; Wuyin Lin; Tao Zhang; Wei Xue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This directory contains the training data and code for training and testing a ResMLP with experience replay for creating a machine-learning physics parameterization for the Community Atmospheric Model.

The directory is structured as follows:

1. Download training and testing data: https://portal.nersc.gov/archive/home/z/zhangtao/www/hybird_GCM_ML

2. Unzip nncam_training.zip

nncam_training

- models

model definition of ResMLP and other models for comparison purposes

- dataloader

utility scripts to load data into pytorch dataset

- training_scripts

scripts to train ResMLP model with/without experience replay

- offline_test

scripts to perform offline test (Table 2, Figure 2)

3. Unzip nncam_coupling.zip

nncam_srcmods

- SourceMods

SourceMods to be used with CAM modules for coupling with neural network

- otherfiles

additional configuration files to setup and run SPCAM with neural network

- pythonfiles

python scripts to run neural network and couple with CAM

- ClimAnalysis

- paper_plots.ipynb

scripts to produce online evaluation figures (Figure 1, Figure 3-10)
e
Data from: HEAPO – An Open Dataset for Heat Pump Optimization with Smart...
earth.org.uk
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brudermueller, Tobias; Brudermueller, Tobias; Fleisch, Elgar; Fleisch, Elgar; Vayá, Marina González; Vayá, Marina González; Staake, Thorsten; Staake, Thorsten (2025). HEAPO – An Open Dataset for Heat Pump Optimization with Smart Electricity Meter Data and On-Site Inspection Protocols [Dataset]. http://doi.org/10.48550/ARXIV.2503.16993
Explore at:
Unique identifier
https://doi.org/10.48550/ARXIV.2503.16993, https://identifiers.org/arxiv:2503.16993
Dataset updated
Mar 21, 2025
Dataset provided by
arXiv
Authors
Brudermueller, Tobias; Brudermueller, Tobias; Fleisch, Elgar; Fleisch, Elgar; Vayá, Marina González; Vayá, Marina González; Staake, Thorsten; Staake, Thorsten
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heat pumps are essential for decarbonizing residential heating but consume substantial electrical energy, impacting operational costs and grid demand. Many systems run inefficiently due to planning flaws, operational faults, or misconfigurations. While optimizing performance requires skilled professionals, labor shortages hinder large-scale interventions. However, digital tools and improved data availability create new service opportunities for energy efficiency, predictive maintenance, and demand-side management. To support research and practical solutions, we present an open-source dataset of electricity consumption from 1,408 households with heat pumps and smart electricity meters in the canton of Zurich, Switzerland, recorded at 15-minute and daily resolutions between 2018-11-03 and 2024-03-21. The dataset includes household metadata, weather data from 8 stations, and ground truth data from 410 field visit protocols collected by energy consultants during system optimizations. Additionally, the dataset includes a Python-based data loader to facilitate seamless data processing and exploration.

Sentence/Table Pair Data from Wikipedia for Pre-training with...

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Oct 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. http://doi.org/10.5281/zenodo.5612316

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5612316

Dataset updated

Oct 29, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

There are two files:

sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

Below is a sample code snippet to load the data

import webdataset as wds

# path to the uncompressed files, should be a directory with a set of tar files
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
dataset = (
  wds.Dataset(url)
  .shuffle(1000) # cache 1000 samples and shuffle
  .decode()
  .to_tuple("json")
  .batched(20) # group every 20 examples into a batch
)

# Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
# You can also iterate through all examples and dump them with your preferred data format

Below we show how the data is organized with two examples.

Text-only

{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
 's1_all_links': {
  'Sils,_Girona': [[0, 4]],
  'municipality': [[10, 22]],
  'Comarques_of_Catalonia': [[30, 37]],
  'Selva': [[41, 46]],
  'Catalonia': [[51, 60]]
 }, # list of entities and their mentions in the sentence (start, end location)
 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
  {
    'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
    's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
    's2s': [ # list of other sentences that contain the common entity pair, or evidence
     {
       'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
       'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
       's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
       'pair_locs': [ # mentions of the entity pair in the evidence
        [[19, 27]], # mentions of entity 1
        [[0, 5], [288, 293]] # mentions of entity 2
       ],
       'all_links': {
        'Selva': [[0, 5], [288, 293]],
        'Comarques_of_Catalonia': [[19, 27]],
        'Catalonia': [[40, 49]]
       }
      }
    ,...] # there are multiple evidence sentences
   },
 ,...] # there are multiple entity pairs in the query
}

Hybrid

{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
 's1_all_links': {...}, # same as text-only
 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
 'table_pairs': [
  'tid': 'Major_League_Baseball-1',
  'text':[
    ['World Series Records', 'World Series Records', ...],
    ['Team', 'Number of Series won', ...],
    ['St. Louis Cardinals (NL)', '11', ...],
  ...] # table content, list of rows
  'index':[
    [[0, 0], [0, 1], ...],
    [[1, 0], [1, 1], ...],
  ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
  'value_ranks':[
    [0, 0, ...],
    [0, 0, ...],
    [0, 10, ...],
  ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
  'value_inv_ranks': [], # inverse rank
  'all_links':{
    'St._Louis_Cardinals': {
     '2': [
      [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
     ] # list of mentions in the second row, the key is row_id
    },
    'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
  }
  'name': '', # table name, if exists
  'pairs': {
    'pair': ['American_League', 'National_League'],
    's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
    'table_pair_locs': {
     '17': [ # mention of entity pair in row 17
       [
        [[17, 0], [3, 18]],
        [[17, 1], [3, 18]],
        [[17, 2], [3, 18]],
        [[17, 3], [3, 18]]
       ], # mention of the first entity
       [
        [[17, 0], [21, 36]],
        [[17, 1], [21, 36]],
       ] # mention of the second entity
     ]
    }
   }
 ]
}

FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas
zenodo.org
application/gzip
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Brune; Eric Brune (2024). FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas [Dataset]. http://doi.org/10.5281/zenodo.13384289
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13384289
Dataset updated
Aug 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eric Brune; Eric Brune
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 5, 2024
Description

# FireSR Dataset

## Overview

**FireSR** is a dataset designed for the super-resolution and segmentation of wildfire-burned areas. It includes data for all wildfire events in Canada from 2017 to 2023 that exceed 2000 hectares in size, as reported by the National Burned Area Composite (NBAC). The dataset aims to support high-resolution daily monitoring and improve wildfire management using machine learning techniques.

## Dataset Structure

The dataset is organized into several directories, each containing data relevant to different aspects of wildfire monitoring:

- **S2**: Contains Sentinel-2 images.
- **pre**: Pre-fire Sentinel-2 images (high resolution).
- **post**: Post-fire Sentinel-2 images (high resolution).

- **mask**: Contains NBAC polygons, which serve as ground truth masks for the burned areas.
- **pre**: Burned area labels from the year before the fire, using the same spatial bounds as the fire events of the current year.
- **post**: Burned area labels corresponding to post-fire conditions.

- **MODIS**: Contains post-fire MODIS images (lower resolution).

- **LULC**: Contains land use/land cover data from ESRI Sentinel-2 10-Meter Land Use/Land Cover (2017-2023).

- **Daymet**: Contains weather data from Daymet V4: Daily Surface Weather and Climatological Summaries.

### File Naming Convention

Each GeoTIFF (.tif) file is named according to the format: `CA_

### Directory Structure

The dataset is organized as follows:

```
FireSR/
│
├── dataset/
│ ├── S2/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── mask/
│ │ ├── post/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ │ ├── pre/
│ │ │ ├── CA_2017_AB_204.tif
│ │ │ ├── CA_2017_AB_2418.tif
│ │ │ └── ...
│ ├── MODIS/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── LULC/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
│ ├── Daymet/
│ │ ├── CA_2017_AB_204.tif
│ │ ├── CA_2017_AB_2418.tif
│ │ └── ...
```

### Spatial Resolution and Channels

- **Sentinel-2 (S2) Images**: 20 meters (Bands: B12, B8, B4)
- **MODIS Images**: 250 meters (Bands: B7, B2, B1)
- **NBAC Burned Area Labels**: 20 meters (1 channel, binary classification: burned/unburned)
- **Daymet Weather Data**: 1000 meters (7 channels: dayl, prcp, srad, swe, tmax, tmin, vp)
- **ESRI Land Use/Land Cover Data**: 10 meters (1 channel with 9 classes: water, trees, flooded vegetation, crops, built area, bare ground, snow/ice, clouds, rangeland)

**Daymet Weather Data**: The Daymet dataset includes seven channels that provide various weather-related parameters, which are crucial for understanding and modeling wildfire conditions:

| Name | Units | Min | Max | Description |

|------|-------|-----|-----|-------------|

| dayl | seconds | 0 | 86400 | Duration of the daylight period, based on the period of the day during which the sun is above a hypothetical flat horizon. |

| prcp | mm | 0 | 544 | Daily total precipitation, sum of all forms converted to water-equivalent. |

| srad | W/m^2 | 0 | 1051 | Incident shortwave radiation flux density, averaged over the daylight period of the day. |

| swe | kg/m^2 | 0 | 13931 | Snow water equivalent, representing the amount of water contained within the snowpack. |

| tmax | °C | -60 | 60 | Daily maximum 2-meter air temperature. |

| tmin | °C | -60 | 42 | Daily minimum 2-meter air temperature. |

| vp | Pa | 0 | 8230 | Daily average partial pressure of water vapor. |

**ESRI Land Use/Land Cover Data**: The ESRI 10m Annual Land Cover dataset provides a time series of global maps of land use and land cover (LULC) from 2017 to 2023 at a 10-meter resolution. These maps are derived from ESA Sentinel-2 imagery and are generated by Impact Observatory using a deep learning model trained on billions of human-labeled pixels. Each map is a composite of LULC predictions for 9 classes throughout the year, offering a representative snapshot of each year.

| Class Value | Land Cover Class |

|-------------|------------------|

| 1 | Water |

| 2 | Trees |

| 4 | Flooded Vegetation |

| 5 | Crops |

| 7 | Built Area |

| 8 | Bare Ground |

| 9 | Snow/Ice |

| 10 | Clouds |

| 11 | Rangeland |

## Usage Tutorial

To help users get started with FireSR, we provide a comprehensive tutorial with scripts for data extraction and processing. Below is an example workflow:

### Step 1: Extract FireSR.tar.gz

```bash
tar -xvf FireSR.tar.gz
```

### Step 2: Tiling the GeoTIFF Files

The dataset contains high-resolution GeoTIFF files. For machine learning models, it may be useful to tile these images into smaller patches. Here's a Python script to tile the images:

```python
import rasterio
from rasterio.windows import Window
import os

def tile_image(image_path, output_dir, tile_size=128):
with rasterio.open(image_path) as src:
for i in range(0, src.height, tile_size):
for j in range(0, src.width, tile_size):
window = Window(j, i, tile_size, tile_size)
transform = src.window_transform(window)
outpath = os.path.join(output_dir, f"{os.path.basename(image_path).split('.')[0]}_{i}_{j}.tif")
with rasterio.open(outpath, 'w', driver='GTiff', height=tile_size, width=tile_size, count=src.count, dtype=src.dtypes[0], crs=src.crs, transform=transform) as dst:
dst.write(src.read(window=window))

# Example usage
tile_image('FireSR/dataset/S2/post/CA_2017_AB_204.tif', 'tiled_images/')
```

### Step 3: Loading Data into a Machine Learning Model

After tiling, the images can be loaded into a machine learning model using libraries like PyTorch or TensorFlow. Here's an example using PyTorch:

```python
import torch
from torch.utils.data import Dataset
from torchvision import transforms
import rasterio

class FireSRDataset(Dataset):
def _init_(self, image_dir, transform=None):
self.image_dir = image_dir
self.transform = transform
self.image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.tif')]

def _len_(self):
return len(self.image_paths)

def _getitem_(self, idx):
image_path = self.image_paths[idx]
with rasterio.open(image_path) as src:
image = src.read()
if self.transform:
image = self.transform(image)
return image

# Example usage
dataset = FireSRDataset('tiled_images/', transform=transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
```

## License

This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material as long as appropriate credit is given.

## Contact

For any questions or further information, please contact:
- Name: Eric Brune
- Email: ebrune@kth.se
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sai Nikhilesh Reddy (2022). Food Recognition 2022 [Dataset]. https://www.kaggle.com/datasets/sainikhileshreddy/food-recognition-2022

Food Recognition 2022

This is a benchmark dataset used for finding best models for detecting food.

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 12, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sai Nikhilesh Reddy

Description

Food Recognition Benchmark 2022 😋

This dataset is Preprocessed⚙️, Compressed🗜️, and Streamable📶!

Problem Statement

The goal of this benchmark is to train models which can look at images of food items and detect the individual food items present in them. We use a novel dataset of food images collected through the MyFoodRepo app, where numerous volunteer Swiss users provide images of their daily food intake in the context of a digital cohort called Food & You. This growing data set has been annotated - or automatic annotations have been verified - with respect to segmentation, classification (mapping the individual food items onto an ontology of Swiss Food items), and weight/volume estimation.

Datasets

Finding annotated food images is difficult. There are some databases with some annotations, but they tend to be limited in important ways. To put it bluntly: most food images on the internet are a lie. Search for any dish, and you’ll find beautiful stock photography of that particular dish. Same on social media: we share photos of dishes with our friends when the image is exceptionally beautiful. But algorithms need to work on real-world images. In addition, annotations are generally missing - ideally, food images would be annotated with proper segmentation, classification, and volume/weight estimates. With this 2022 iteration of the Food Recognition Benchmark, AIcrowd released v2.0 of the MyFoodRepo dataset, containing a training set of 39,962 images food items, with 76,491 annotations.

Zipped Datasets is in MS-COCO format:

raw_data/public_training_set_release_2.0.tar.gz: Training Set -> 39,962 (as RGB images) food images -> 76491 annotations -> 498 food classes raw_data/public_validation_set_2.0.tar.gz: Validation Set -> 1000 (as RGB images) food images -> 1830 annotations -> 498 food classes raw_data/public_test_release_2.0.tar.gz: Public Test Set -> Food Recognition Benchmark 2022

Check the usage at the notebook

Kaggle Notebook - https://www.kaggle.com/sainikhileshreddy/how-to-use-the-dataset

Usage of the processed kaggle dataset

import hub
ds = hub.dataset('/kaggle/input/food-recognition-2022/hub/train/')

Usage of the dataset anywhere (through streaming)

import hub
ds = hub.dataset('hub://sainikhileshreddy/food-recognition-2022-train/')

Usage of the hub dataset using popular deep learning frameworks

1. Food Recognition 2020 with PyTorch in Python

dataloader = ds.pytorch(num_workers = 2, shuffle = True, transform = transform, batch_size= batch_size)

2. Food Recognition 2020 with TensorFlow in Python

ds_tensorflow = ds.tensorflow()

Evaluation

The benchmark uses the official detection evaluation metrics used by COCO. The primary evaluation metric is AP @ IoU=0.50:0.05:0.95. The seconday evaluation metric is AR @ IoU=0.50:0.05:0.95. A further discussion about the evaluation metric can be found here.

Dataset Original Source

Dataset has been taken from the Food Recognition Benchmark 2022. You can find more details about the challenge on the below link https://www.aicrowd.com/challenges/food-recognition-benchmark-2022

Resources

Activeloop Hub: https://docs.activeloop.ai/
Github: SaiNikhileshReddy | Food-Recognition-2022
Kaggle Discussion - What is Activeloop Hub Format?

Clear search

Close search

Google apps

Main menu

Food Recognition 2022

Food Recognition Benchmark 2022 😋

Problem Statement

Datasets

Zipped Datasets is in MS-COCO format:

Check the usage at the notebook

Usage of the processed kaggle dataset

Usage of the dataset anywhere (through streaming)

Usage of the hub dataset using popular deep learning frameworks

1. Food Recognition 2020 with PyTorch in Python

2. Food Recognition 2020 with TensorFlow in Python

Evaluation

Dataset Original Source

Resources

‘COVID-19 dataset in Japan’ analyzed by Analyst-2

1. Context

1.1 Total number of cases in Japan

1.2 The number of cases at prefecture level

1.3 Metadata of each prefecture

2. Acknowledgements

Inspiration

License and how to cite

Data and code for training and testing a ResMLP model with experience replay...

Data from: HEAPO – An Open Dataset for Heat Pump Optimization with Smart...

Sentence/Table Pair Data from Wikipedia for Pre-training with...

FireSR: A Dataset for Super-Resolution and Segmentation of Burned Areas

Food Recognition 2022

This is a benchmark dataset used for finding best models for detecting food.

Food Recognition Benchmark 2022 😋

Problem Statement

Datasets

Zipped Datasets is in MS-COCO format:

Check the usage at the notebook

Usage of the processed kaggle dataset

Usage of the dataset anywhere (through streaming)

Usage of the hub dataset using popular deep learning frameworks

1. Food Recognition 2020 with PyTorch in Python

2. Food Recognition 2020 with TensorFlow in Python

Evaluation

Dataset Original Source

Resources