100+ datasets found
  1. d

    Monthly Modal Time Series

    • catalog.data.gov
    • data.transportation.gov
    • +1more
    Updated Jun 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2025). Monthly Modal Time Series [Dataset]. https://catalog.data.gov/dataset/monthly-modal-time-series
    Explore at:
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Federal Transit Administration
    Description

    Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.

  2. P

    TFH_Annotated_Dataset Dataset

    • paperswithcode.com
    Updated Sep 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). TFH_Annotated_Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tfh-annotated-dataset
    Explore at:
    Dataset updated
    Sep 6, 2022
    Description

    Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is 1.

    The well-crafted information schema used for patent annotation contains 17 types of entities and 15 types of semantic relations as shown below.

    Table 1 The specification of entity types

    TypeCommentexample
    physical flowsubstance that flows freelyThe etchant solution has a suitable solvent additive such as glycerol or methyl cellulose
    information flowinformation dataA camera using a film having a magnetic surface for recording magnetic data thereon
    energy flowentity relevant to energyConductor is utilized for producing writing flux in magnetic yoke
    measurementmethod of measuring somethingThe curing step takes place at the substrate temperature less than 200.degree
    valuenumerical amountThe curing step takes place at the substrate temperature less than 200.degree
    locationplace or positionThe legs are thinner near the pole tip than in the back gap region
    stateparticular condition at a specific timeThe MR elements are biased to operate in a magnetically unsaturated mode
    effectchange caused an innovationMagnetic disk system permits accurate alignment of magnetic head with spaced tracks
    functionmanufacturing technique or activityA magnetic head having highly efficient write and read functions is thereby obtained
    shapethe external form or outline of somethingRecess is filled with non-magnetic material such as glass
    componenta part or element of a machineA pole face of yoke is adjacent edge of element remote from surface
    attributiona quality or feature of somethingA pole face of yoke is adjacent edge of element remote from surface
    consequenceThe result caused by something or activityThis prevents the slider substrate from electrostatic damage
    systema set of things working together as a wholeA digital recording system utilizing a magnetoresistive transducer in a magnetic recording head
    materialthe matter from which a thing is madeInterlayer may comprise material such as Ta
    scientific conceptterminology used in scientific theoryPeak intensity ratio represents an amount hydrophilic radical
    otherNot belongs to the above entity typesPressure distribution across air bearing surface is substantially symmetrical side

    Table 2 The specification of relation types

    TYPECOMMENTEXAMPLE
    spatial relationspecify how one entity is located in relation to othersGap spacer material is then deposited on the film knife-edge
    part-ofthe ownership between two entitiesa magnetic head has a magnetoresistive element
    causative relationone entity operates as a cause of the other entityPressure pad carried another arm of spring urges film into contact with head
    operationspecify the relation between an activity and its objectHeat treatment improves the (100) orientation
    made-ofone entity is the material for making the other entityThe thin film head includes a substrate of electrically insulative material
    instance-ofthe relation between a class and its instanceAt least one of the magnetic layer is a free layer
    attributionone entity is an attribution of the other entityThe thin film has very high heat resistance of remaining stable at 700.degree
    generatingone entity generates another entityBuffer layer resistor create impedance that noise introduced to head from disk of drive
    purposerelation between reason/resultconductor is utilized for producing writing flux in magnetic yoke
    in-manner-ofdo something in certain wayThe linear array is angled at a skew angle
    aliasone entity is also known under another entity’s nameThe bias structure includes an antiferromagnetic layer AFM
    formationan entity acts as a role of the other entityWindings are joined at end to form center tapped winding
    comparisoncompare one entity to the otherFirst end is closer to recording media use than second end
    measurementone entity acts as a way to measure the other entityThis provides a relative permeance of at least 1000
    othernot belongs to the above typesThen, MR resistance estimate during polishing step is calculated from S value and K value

    There are 1010 patent abstracts with 3,986 sentences in this corpus . We use a web-based annotation tool named Brat2 for data labeling, and the annotated data is saved in '.ann' format. The benefit of 'ann' is that you can display and manipulate the annotated data once the TFH_Annotated_Dataset.zip is unzipped under corresponding repository of Brat.

    TFH_Annotated_Dataset contains 22,833 entity mentions and 17,412 semantic relation mentions. With TFH_Annotated_Dataset, we run two tasks of information extraction including named entity recognition with BiLSTM-CRF[3] and semantic relation extractionand with BiGRU-2ATTENTION[4]. For improving semantic representation of patent language, the word embeddings are trained with the abstract of 46,302 patents regarding magnetic head in hard disk drive, which turn out to improve the performance of named entity recognition by 0.3% and semantic relation extraction by about 2% in weighted average F1, compared to GloVe and the patent word embedding provided by Risch et al[5].

    For named entity recognition, the weighted-average precision, recall, F1-value of BiLSTM-CRF on entity-level for the test set are 78.5%, 78.0%, and 78.2%, respectively. Although such performance is acceptable, it is still lower than its performance on general-purpose dataset by more than 10% in F1-value. The main reason is the limited amount of labeled dataset.

    The precision, recall, and F1-value for each type of entity is shown in Fig. 4. As to relation extraction, the weighted-average precision, recall, F1-value of BiGRU-2ATTENTION for the test set are 89.7%, 87.9%, and 88.6% with no_edge relations, and 32.3%, 41.5%, 36.3% without no_edge relations.

    Academic citing Chen, L., Xu, S*., Zhu, L. et al. A deep learning based method for extracting semantic information from patent documents. Scientometrics 125, 289–312 (2020). https://doi.org/10.1007/s11192-020-03634-y

    Paper link https://link.springer.com/article/10.1007/s11192-020-03634-y

    REFERENCE 1 Pérez-Pérez, M., Pérez-Rodríguez, G., Vazquez, M., Fdez-Riverola, F., Oyarzabal, J., Oyarzabal, J., Valencia,A., Lourenço, A., & Krallinger, M. (2017). Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: The CEMP and GPRO patents tracks. In Proceedings of the Bio-Creative V.5 challenge evaluation workshop, pp. 11–18.

    2 Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J. I. (2012). BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102-107)

    [3] Huang, Z., Xu, W., &Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991

    [4] Han,X., Gao,T., Yao,Y., Ye,D., Liu,Z., Sun, M.(2019). OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. arXiv preprint arXiv: 1301.3781

    [5] Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53(1), 108–122.

  3. Z

    Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

    • data.niaid.nih.gov
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miha Mohorčič (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
    Explore at:
    Dataset updated
    Jan 6, 2023
    Dataset provided by
    Andrej Hrovat
    Aleš Simončič
    Miha Mohorčič
    Mihael Mohorčič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

    This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

    It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

    Related dataset

    Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

    Measurement setup

    The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

    The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

    The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

    Data preprocessing

    The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

    PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

    Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
    Missing IE fields in the captured PR are not included in PR_IE_DATA.

    When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

    {'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

    where PR_data is structured as follows:

    { 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

    This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

    At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

    Folder structure

    For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

    The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

    Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

    Environments description

    The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

    Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

    Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

    Known dataset shortcomings

    Due to technical and physical limitations, the dataset contains some identified deficiencies.

    PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

    Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

    The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

     Location 1 - Piazza del Duomo - Chierici
    

    The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

     Location 2 - Via Etnea - Piazza del Duomo
    

    The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

     Location 3 - Via Etnea - Piazza Università
    

    Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

     Location 4 - Piazza Università
    

    This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

    Recognitions

    The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.

  4. P

    Data from: SLTrans Dataset

    • paperswithcode.com
    • huggingface.co
    • +1more
    Updated Mar 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indraneil Paul; Goran Glavaš; Iryna Gurevych (2024). SLTrans Dataset [Dataset]. https://paperswithcode.com/dataset/sltrans
    Explore at:
    Dataset updated
    Mar 5, 2024
    Authors
    Indraneil Paul; Goran Glavaš; Iryna Gurevych
    Description

    The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.

  5. c

    Ginga LAC Mode Catalog

    • s.cnmilf.com
    • catalog.data.gov
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Energy Astrophysics Science Archive Research Center (2025). Ginga LAC Mode Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ginga-lac-mode-catalog
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    High Energy Astrophysics Science Archive Research Center
    Description

    The GINGAMODE database table contains selected information from the Large Area Counter (LAC) aboard the third Japanese X-ray astronomy satellite Ginga. The Ginga experiment began on day 36, 5 February 1987 and ended in November 1991. Ginga consisted of the LAC, the all-sky monitor (ASM) and the gamma-ray burst detector (GBD). The satellite was in a circular orbit at 31 degree inclination with apogee 670 km and perigee 510 km, and with a period of 96 minutes. A Ginga observation consisted of varying numbers of major frames which had lengths of 4, 32, or 128 seconds, depending on the setting of the bitrate. Each GINGAMODE database entry consists of data from the first record of a series of observations having the same values of the following: "BITRATE", "LACMODE", "DISCRIMINATOR", or "ACS MONITOR". When any of these changed, a new entry was written into GINGAMODE. The other Ginga catalog database, GINGALOG is also a subset of the same LAC dump file used to create GINGAMODE. GINGALOG contains a listing only whenever the "ACS monitor" (Attitude Control System) changes. Thus, GINGAMODE monitors changes in four parameters and GINGALOG is a basic log database mapping the individual FITS files. Ginga FITS files may have more than one entries in the GINGAMODE database. Both databases point to the same archived Flexible Image Transport System (FITS) files created from the LAC dump files. The user is invited to browse though the observations available from Ginga using GINGALOG or GINGAMODE, then extract the FITS files for more detailed analysis. The Ginga LAC Mode Catalog was prepared from data sent to NASA/GSFC from the Institute of Space and Astronautical Science (ISAS) in Japan.

    Duplicate entries were removed from the HEASARC implementation of this catalog in June 2019. This is a service provided by NASA HEASARC .

  6. Good Growth Plan 2014-2019 - Japan

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syngenta (2023). Good Growth Plan 2014-2019 - Japan [Dataset]. https://microdata.worldbank.org/index.php/catalog/5634
    Explore at:
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Syngenta
    Time period covered
    2014 - 2019
    Area covered
    Japan
    Description

    Abstract

    Syngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.

    Geographic coverage

    National coverage

    Analysis unit

    Agricultural holdings

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.

    B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).

    C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.

    BF Screened from Japan were selected based on the following criterion: Location: Hokkaido Tokachi (JA Memuro, JA Otofuke, JA Tokachi Shimizu, JA Obihiro Taisho) --> initially focus on Memuro, Otofuke, Tokachi Shimizu, Obihiro Taisho // Added locations in GGP 2015 due to change of RF: Obhiro, Kamikawa, Abashiri
    BF: no use of in furrow application (Amigo) - no use of Amistar

    Contract farmers of snacks and other food companies --> screening question: 'Do you have quality contracts in place with snack and food companies for your potato production? Y/N --> if no, screen out

    Increase of marketable yield --> screening question: 'Are you interested in growing branded potatoes (premium potatoes for processing industry)? Y/N --> if no, screen out

    Potato growers for process use
    Background info: No mention of Syngenta Background info: - Labor cost is very serious issue: In general, labor cost in Japan is very high. Growers try to reduce labor cost by mechanization. Percentage of labor cost in production cost. They would like to manage cost of labor - Quality and yield driven

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Data collection tool for 2019 covered the following information:

    (A) PRE- HARVEST INFORMATION

    PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment

    (B) HARVEST INFORMATION

    PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation

    See all questionnaires in external materials tab

    Cleaning operations

    Data processing:

    Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.

    Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.

    • Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.

    • Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.

    • Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.

    • Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.

    • Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.

    • Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.

    • It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.

    Data appraisal

    Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:

    For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.

  7. TMD Dataset - 5 seconds sliding window

    • kaggle.com
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Schwartzer (2019). TMD Dataset - 5 seconds sliding window [Dataset]. https://www.kaggle.com/fschwartzer/tmd-dataset-5-seconds-sliding-window
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Fernando Schwartzer
    Description

    Context

    Identify user’s transportation modes through observations of the user, or observation of the environment, is a growing topic of research, with many applications in the field of Internet of Things (IoT). Transportation mode detection can provide context information useful to offer appropriate services based on user’s needs and possibilities of interaction.

    Content

    Initial data pre-processing phase: data cleaning operations are performed, such as delete measure from the sensors to exclude, make the values of the sound and speed sensors positive etc...

    Furthermore some sensors, like ambiental (sound, light and pressure) and proximity, returns a single data value as the result of sense, this can be directly used in dataset. Instead, all the other return more than one values that are related to the coordinate system used, so their values are strongly related to orientation. For almost all we can use an orientation-independent metric, magnitude.

    Acknowledgements

    A sensor measures different physical quantities and provides corresponding raw sensor readings which are a source of information about the user and their environment. Due to advances in sensor technology, sensors are getting more powerful, cheaper and smaller in size. Almost all mobile phones currently include sensors that allow the capture of important context information. For this reason, one of the key sensors employed by context-aware applications is the mobile phone, that has become a central part of users lives.

    Inspiration

    User transportation mode recognition can be considered as a HAR task (Human Activity Recognition). Its goal is to identify which kind of transportation - walking, driving etc..- a person is using. Transportation mode recognition can provide context information to enhance applications and provide a better user experience, it can be crucial for many different applications, such as device profiling, monitoring road and traffic condition, Healthcare, Traveling support etc..

    Original dataset from: Carpineti C., Lomonaco V., Bedogni L., Di Felice M., Bononi L., "Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity", in Proceedings of the 14th Workshop on Context and Activity Modeling and Recognition (IEEE COMOREA 2018), Athens, Greece, March 19-23, 2018 [Pre-print available]

  8. d

    GLO climate data stats summary

    • data.gov.au
    • researchdata.edu.au
    • +2more
    zip
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
    Explore at:
    zip(8810)Available download formats
    Dataset updated
    Apr 13, 2022
    Dataset authored and provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

    1. Time series mean annual BAWAP rainfall from 1900 - 2012.

    2. Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

    3. Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

    4. Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

    As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    There are 4 csv files here:

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset History

    Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset Citation

    Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

    Dataset Ancestors

  9. A geometric shape regularity effect in the human brain: fMRI dataset

    • openneuro.org
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene (2025). A geometric shape regularity effect in the human brain: fMRI dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006010.v1.0.1
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A geometric shape regularity effect in the human brain: fMRI dataset

    Authors:

    • Mathias Sablé-Meyer*
    • Lucas Benjamin
    • Cassandra Potier Watkins
    • Chenxi He
    • Maxence Pajot
    • Théo Morfoisse
    • Fosca Al Roumi
    • Stanislas Dehaene

    *Corresponding author: mathias.sable-meyer@ucl.ac.uk

    Abstract

    The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.

    Notes about this dataset

    We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the fMRI dataset of N=20 adult participants (sub-2xx, numbers between 204 and 223), and N=22 children (sub-3xx, numbers between 301 and 325).

    • The code for the analyses is provided at https://github.com/mathias-sm/AGeometricShapeRegularityEffectHumanBrain
      However, the analyses work from already preprocessed data. Since there is no custom code per se for the preprocessing, I have not included it in the repository. To preprocess the data as was done in the published article, here is the command and software information:
      • fMRIPrep version: 20.0.5
      • fMRIPrep command: /usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2
    • Defacing has been performed with bidsonym running the pydeface masking, and nobrainer brain registraction pipeline.
      The published analyses have been performed on the non-defaced data. I have checked for data quality on all participants after defacing. In specific cases, I may be able to request the permission to share the original, non-defaced dataset.
    • sub-325 was acquired by a different experimenter and defaced before being shared with the rest of the research team, hence why the slightly different defacing mask. That participant was also preprocessed separately, and using a more recent fMRIPrep version: 20.2.6.
    • The data associated with the children has a few missing files. Notably:
      1. sub-313 and sub-316 are missing one run of the localizer each
      2. sub-316 has no data at all for the geometry
      3. sub-308 has eno useable data for the intruder task Since all of these still have some data to contribute to either task, all available files were kept on this dataset. The analysis code reflects these inconsistencies where required with specific exceptions.
  10. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mais, Lisa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Hirsch, Peter
    Rumberger, Josef Lorenz
    Kainmueller, Dagmar
    Kandarpa, Ramya
    Reinke, Annika
    Ihrke, Gudrun
    Managan, Claire
    Mais, Lisa
    Maier-Hein, Lena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  11. BioTISR: a time-lapse biological image dataset for super-resolution...

    • zenodo.org
    bin
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Qiao; Chang Qiao; Wencong Xu; Wencong Xu (2024). BioTISR: a time-lapse biological image dataset for super-resolution microscopy [Dataset]. http://doi.org/10.5281/zenodo.13984825
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chang Qiao; Chang Qiao; Wencong Xu; Wencong Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BioTISR is a biological image dataset for super-resolution microscopy, currently including 2D and 3D time-lapse image pairs of low-and-high resolution images of a variety of biology structures, aiming to provide a high-quality dataset of time-lapse biological SR images for the community to spark more developments of computational SR methods.

    At present, 2D dataset includes five specimens (clathrin-coated pits, lysosomes, outer mitochondrial membrane, microtubules, and F-actin) acquired with the GI/TIRF-SIM mode and nonlinear SIM mode of our Multi-SIM system, and 3D dataset includes three specimens (outer mitochondrial membrane, microtubules, and F-actin) acquired with 3D-SIM mode of the Multi-SIM system. For each type of specimen and each imaging modality, we acquired the raw data from at least 50 distinct regions-of-interest (ROI). For each ROI, we acquired two (3D data) or three (2D data) groups of N-phase × M-orientation × T-timepoint raw images with a constant exposure time but increasing the excitation light intensity, where (N, M, T) are (3, 3, 20) for TIRF-SIM and GI-SIM, (5, 5, 10) for nonlinear SIM, and (3, 5, 10) for 3D-SIM. Specific imaging conditions and scripts for reading MRC file are provided in Supplement Files.

    The BioTISR dataset is related to the following paper:Chang Qiao, Shuran Liu, Yuwang Wang, Wencong Xu, et al. "Time-lapse Image Super-resolution Neural Network with Reliable Confidence Evaluation for Optical Microscopy." bioRxiv 2024.05.04.592503 (2024), which is an extension of our previously published BioSR dataset (https://www.nature.com/articles/s41592-020-01048-5).

    Limited by quota, the original images uploaded in the current 3D dataset are wide-field images obtained after averaging 15 images, where (N, M, T) are (1, 1, 10). We will update them to raw SIM images after the quota is expanded.

    2D dataset's url:

    https://doi.org/10.5281/zenodo.13843670

    3D dataset's urls:

    F-actin:

    WF input: https://doi.org/10.5281/zenodo.13843673

    Raw SIM input:https://doi.org/10.5281/zenodo.13994464

    Microtubules:

    WF input: https://doi.org/10.5281/zenodo.13932988

    Raw SIM input: https://doi.org/10.5281/zenodo.13989327

    Mitochondria:

    WF input: https://doi.org/10.5281/zenodo.13843183

    Raw SIM input: https://doi.org/10.5281/zenodo.14000502

  12. Z

    Data from: 359,569 commits with source code density; 1149 commits of which...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hönel, Sebastian (2020). 359,569 commits with source code density; 1149 commits of which have software maintenance activity labels (adaptive, corrective, perfective) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2590518
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Hönel, Sebastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comes as SQL-importable file and is compatible with the widely available MariaDB- and MySQL-databases.

    It is based on (and incorporates/extends) the dataset "1151 commits with software maintenance activity labels (corrective,perfective,adaptive)" by Levin and Yehudai (https://doi.org/10.5281/zenodo.835534).

    The extensions to this dataset were obtained using Git-Tools, a tool that is included in the Git-Density (https://doi.org/10.5281/zenodo.2565238) suite. For each of the projects in the original dataset, Git-Tools was run in extended mode.

    The dataset contains these tables:

    x1151: The original dataset from Levin and Yehudai.

    despite its name, this dataset has only 1,149 commits, as two commits were duplicates in the original dataset.

    This dataset spanned 11 projects, each of which had between 99 and 114 commits

    This dataset has 71 features and spans the projects RxJava, hbase, elasticsearch, intellij-community, hadoop, drools, Kotlin, restlet-framework-java, orientdb, camel and spring-framework.

    gtools_ex (short for Git-Tools, extended)

    Contains 359,569 commits, analyzed using Git-Tools in extended mode

    It spans all commits and projects from the x1151 dataset as well.

    All 11 projects were analyzed, from the initial commit until the end of January 2019. For the projects Intellij and Kotlin, the first 35,000 resp. 30,000 commits were analyzed.

    This dataset introduces 35 new features (see list below), 22 of which are size- or density-related.

    The dataset contains these views:

    geX_L (short for Git-tools, extended, with labels)

    Joins the commits' labels from x1151 with the extended attributes from gtools_ex, using the commits' hashes.

    jeX_L (short for joined, extended, with labels)

    Joins the datasets x1151 and gtools_ex entirely, based on the commits' hashes.

    Features of the gtools_ex dataset:

    SHA1

    RepoPathOrUrl

    AuthorName

    CommitterName

    AuthorTime (UTC)

    CommitterTime (UTC)

    MinutesSincePreviousCommit: Double, describing the amount of minutes that passed since the previous commit. Previous refers to the parent commit, not the previous in time.

    Message: The commit's message/comment

    AuthorEmail

    CommitterEmail

    AuthorNominalLabel: All authors of a repository are analyzed and merged by Git-Density using some heuristic, even if they do not always use the same email address or name. This label is a unique string that helps identifying the same author across commits, even if the author did not always use the exact same identity.

    CommitterNominalLabel: The same as AuthorNominalLabel, but for the committer this time.

    IsInitialCommit: A boolean indicating, whether a commit is preceded by a parent or not.

    IsMergeCommit: A boolean indicating whether a commit has more than one parent.

    NumberOfParentCommits

    ParentCommitSHA1s: A comma-concatenated string of the parents' SHA1 IDs

    NumberOfFilesAdded

    NumberOfFilesAddedNet: Like the previous property, but if the net-size of all changes of an added file is zero (i.e. when adding a file that is empty/whitespace or does not contain code), then this property does not count the file.

    NumberOfLinesAddedByAddedFiles

    NumberOfLinesAddedByAddedFilesNet: Like the previous property, but counts the net-lines

    NumberOfFilesDeleted

    NumberOfFilesDeletedNet: Like the previous property, but considers only files that had net-changes

    NumberOfLinesDeletedByDeletedFiles

    NumberOfLinesDeletedByDeletedFilesNet: Like the previous property, but counts the net-lines

    NumberOfFilesModified

    NumberOfFilesModifiedNet: Like the previous property, but considers only files that had net-changes

    NumberOfFilesRenamed

    NumberOfFilesRenamedNet: Like the previous property, but considers only files that had net-changes

    NumberOfLinesAddedByModifiedFiles

    NumberOfLinesAddedByModifiedFilesNet: Like the previous property, but counts the net-lines

    NumberOfLinesDeletedByModifiedFiles

    NumberOfLinesDeletedByModifiedFilesNet: Like the previous property, but counts the net-lines

    NumberOfLinesAddedByRenamedFiles

    NumberOfLinesAddedByRenamedFilesNet: Like the previous property, but counts the net-lines

    NumberOfLinesDeletedByRenamedFiles

    NumberOfLinesDeletedByRenamedFilesNet: Like the previous property, but counts the net-lines

    Density: The ratio between the two sums of all lines added+deleted+modified+renamed and their resp. gross-version. A density of zero means that the sum of net-lines is zero (i.e. all lines changes were just whitespace, comments etc.). A density of of 1 means that all changed net-lines contribute to the gross-size of the commit (i.e. no useless lines with e.g. only comments or whitespace).

    AffectedFilesRatioNet: The ratio between the sums of NumberOfFilesXXX and NumberOfFilesXXXNet

    This dataset is supporting the paper "Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities", as submitted to the QRS2019 conference (The 19th IEEE International Conference on Software Quality, Reliability, and Security). Citation: Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2019. Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities. In The 19th IEEE International Conference on Software Quality, Reliability, and Security.

  13. w

    Dataset of book subjects that contain Multi-step multi-input one-way quantum...

    • workwithdata.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Multi-step multi-input one-way quantum information processing with spatial and temporal modes of light [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Multi-step+multi-input+one-way+quantum+information+processing+with+spatial+and+temporal+modes+of+light&j=1&j0=books
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 3 rows and is filtered where the books is Multi-step multi-input one-way quantum information processing with spatial and temporal modes of light. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  14. 🎹 Spotify Tracks Dataset

    • kaggle.com
    Updated Oct 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MaharshiPandya (2022). 🎹 Spotify Tracks Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/4372070
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MaharshiPandya
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Content

    This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

    Usage

    The dataset can be used for:

    • Building a Recommendation System based on some user input or preference
    • Classification purposes based on audio features and available genres
    • Any other application that you can think of. Feel free to discuss!

    Column Description

    • track_id: The Spotify ID for the track
    • artists: The artists' names who performed the track. If there is more than one artist, they are separated by a ;
    • album_name: The album name in which the track appears
    • track_name: Name of the track
    • popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
    • duration_ms: The track length in milliseconds
    • explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)
    • danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable
    • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale
    • key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1
    • loudness: The overall loudness of a track in decibels (dB)
    • mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0
    • speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks
    • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
    • instrumentalness: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content
    • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live
    • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
    • tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
    • time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.
    • track_genre: The genre in which the track belongs

    Acknowledgement

    Image credits: BPR world

  15. COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2023). COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/4061
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2020
    Area covered
    Vietnam
    Description

    Geographic coverage

    National, regional

    Analysis unit

    Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The 2020 Vietnam COVID-19 High Frequency Phone Survey of Households (VHFPS) uses a nationally representative household survey from 2018 as the sampling frame. The 2018 baseline survey includes 46,980 households from 3132 communes (about 25% of total communes in Vietnam). In each commune, one EA is randomly selected and then 15 households are randomly selected in each EA for interview. We use the large module of to select the households for official interview of the VHFPS survey and the small module households as reserve for replacement. After data processing, the final sample size for Round 2 is 3,935 households.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The questionnaire for Round 2 consisted of the following sections

    Section 2. Behavior Section 3. Health Section 5. Employment (main respondent) Section 6. Coping Section 7. Safety Nets Section 8. FIES

    Cleaning operations

    Data cleaning began during the data collection process. Inputs for the cleaning process include available interviewers’ note following each question item, interviewers’ note at the end of the tablet form as well as supervisors’ note during monitoring. The data cleaning process was conducted in following steps: • Append households interviewed in ethnic minority languages with the main dataset interviewed in Vietnamese. • Remove unnecessary variables which were automatically calculated by SurveyCTO • Remove household duplicates in the dataset where the same form is submitted more than once. • Remove observations of households which were not supposed to be interviewed following the identified replacement procedure. • Format variables as their object type (string, integer, decimal, etc.) • Read through interviewers’ note and make adjustment accordingly. During interviews, whenever interviewers find it difficult to choose a correct code, they are recommended to choose the most appropriate one and write down respondents’ answer in detail so that the survey management team will justify and make a decision which code is best suitable for such answer. • Correct data based on supervisors’ note where enumerators entered wrong code. • Recode answer option “Other, please specify”. This option is usually followed by a blank line allowing enumerators to type or write texts to specify the answer. The data cleaning team checked thoroughly this type of answers to decide whether each answer needed recoding into one of the available categories or just keep the answer originally recorded. In some cases, that answer could be assigned a completely new code if it appeared many times in the survey dataset.
    • Examine data accuracy of outlier values, defined as values that lie outside both 5th and 95th percentiles, by listening to interview recordings. • Final check on matching main dataset with different sections, where information is asked on individual level, are kept in separate data files and in long form. • Label variables using the full question text. • Label variable values where necessary.

  16. Power Transformers FDD and RUL

    • kaggle.com
    zip
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iurii Katser (2024). Power Transformers FDD and RUL [Dataset]. https://www.kaggle.com/datasets/yuriykatser/power-transformers-fdd-and-rul
    Explore at:
    zip(33405750 bytes)Available download formats
    Dataset updated
    Sep 1, 2024
    Authors
    Iurii Katser
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Datasets with dissolved gases concentrations in power transformer oil for remaining useful life (RUL), fault detection and diagnosis (FDD) problems.

    Introduction

    Power transformers (PTs) are an important component of a nuclear power plant (NPP). They convert alternating voltage and are instrumental in power supply of both external NPP energy consumers and NPPs themselves. Currently, many PTs have exceeded planned service life that had been extended over the designated 25 years. Due to the extension, monitoring the PT technical condition becomes an urgent matter.

    An important method for monitoring and diagnosing PTs is Chromatographic Analysis of Dissolved Gas (CADG). It is based on the principle of forced extraction and analysis of dissolved gases from PT oil. Almost all types of equipment defects are accompanied by formation of gases that dissolve in oil; certain types of defects generate certain gases in different quantities. The concentrations also differ on various stages of defects developing that allows to calculate RUL of the PT. At present, NPP control and diagnostic systems for PT equipment use predefined control limits for concentration of dissolved gases in oil. The main disadvantages of this approach are the lack of automatic control and insufficient quality of diagnostics, especially for PTs with extended service life. To combat these shortcomings in diagnostic systems for the analysis of data obtained using CADG, machine learning (ML) methods can be used, as they are used in diagnostics of many NNP components.

    Data description

    The datasets are available as .csv files containing 420 records of gas concentration, presented as a time dependence. The gasses are 𝐻2, 𝐶𝑂, 𝐶2𝐻4 и 𝐶2𝐻2. The period between time points is 12 hours. There are 3000 datasets splitted into train (2100 datasets) and test (900 datasets) sets.

    For RUL problem, annotations are available (in the separate files): each .csv file corresponds to a value in points that is equal the time remaining until the equipment fails, at the end of record.

    For FDD problems, there are labels (in the separate files) with four PT operating modes (classes): 1. Normal mode (2436 datasets); 2. Partial discharge: local dielectric breakdown in gas-filled cavities (127 datasets); 3. Low energy discharge: sparking or arc discharges in poor contact connections of structural elements with different or floating potential; discharges between PT core structural elements, high voltage winding taps and the tank, high voltage winding and grounding; discharges in oil during contact switching (162 datasets); 4. Low-temperature overheating: oil flow disruption in windings cooling channels, magnetic system causing low efficiency of the cooling system for temperatures < 300 °C (275 datasets).

    Data in this repository is an extension (test set added) of data from here and here.

    FDD problems statement

    In our case, the fault detection problem transforms into a classification problem, since the data is related to one of four labeled classes (including one normal and three anomalous), so the model’s output needs to be a class number. The problem can be stated as binary classification (healthy/anomalous) for fault detection or multi class classification (on of 4 states) for fault diagnosis.

    RUL problem statement

    To ensure high-quality maintenance and repair, it is vital to be aware of potential malfunctions and predict RUL of transformer equipment. Therefore, it is necessary to create a mathematical model that will determine RUL by the final 420 points.

    Data usage examples

    • Dataset was used in this article.
    • Dataset was used in this research by Katser et.al. that solves the problem proposing ensemble of classifiers.
  17. d

    HIRENASD Comparisons of FEM modal frequencies and modeshapes

    • catalog.data.gov
    • data.wu.ac.at
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). HIRENASD Comparisons of FEM modal frequencies and modeshapes [Dataset]. https://catalog.data.gov/dataset/hirenasd-comparisons-of-fem-modal-frequencies-and-modeshapes
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Below are frequency comparisons of different models with experiment Note Modeshapes aren't very descriptive for higher modes. There is coupling between them so this is just an approximate naming scheme. See modeshape plots for more details. PDF files are provided with figures of the modeshapes for selected FEM TET10 model (Nov 2011) (CASE 10) Hex8 Modeshapes (CASE 4) TET10 no modelcart (CASE 5) HIRENASD TET model with modelcart - new OML HIRENASD HEX 8 Wing only model Mode 1 Mode 1 Mode 2 Mode 2 Mode 3 Mode 3 Mode 4 Mode 4 Mode 5 Mode 5 Mode 6 Mode 6 Mode 7 Mode 7 Mode 8 Mode 8 Mode 9 Mode 9 Mode 10 Mode 10 Mode 11 Mode 12

  18. m

    Graphite//LFP synthetic duty cycle dataset

    • data.mendeley.com
    • narcis.nl
    Updated Mar 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthieu Dubarry (2021). Graphite//LFP synthetic duty cycle dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.2
    Explore at:
    Dataset updated
    Mar 12, 2021
    Authors
    Matthieu Dubarry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details

    For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

    %degradation=a × cycle+ (exp^(b×cycle)-1) (1)

    Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

    %LLI=%LLI+p8 (LAM_PE-PT) (2)

    Where PT was calculated with equation (3) from [60].

    PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

    Varying all those parameters accounted for close to 130,000 individual duty cycles with one voltage curve for every 200 cycles (one every 10 cycles for cycles 1-200)

    This dataset necessitate the associated V vs. Q dataset to be functional (10.17632/bs2j56pn7y.2) See read me for details on variable and examples.

  19. CATS-ISS Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer - Dataset...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). CATS-ISS Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/cats-iss-level-2-operational-day-mode-7-2-version-3-01-5-km-layer-3266c
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    CATS-ISS_L2O_D-M7.2-V3-01_05kmLay is the Cloud-Aerosol Transport System (CATS) International Space Station (ISS) Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer data product. This collection spans from March 25, 2015 to October 29, 2017. CATS, which was launched on January 10, 2015, was a lidar remote sensing instrument that provided range-resolved profile measurements of atmospheric aerosols and clouds from the ISS. CATS was intended to operate on-orbit for up to three years. CATS provides vertical profiles at three wavelengths, orbiting between ~230 and ~270 miles above the Earth's surface at a 51-degree inclination with nearly a three-day repeat cycle. For the first time, scientists were able to study diurnal (day-to-night) changes in cloud and aerosol effects from space by observing the same spot on Earth at different times each day. CATS Level 2 Layer data products contain geophysical parameters and are derived from Level 1 data, at 60m vertical and 5km horizontal resolution.

  20. Multi-Mission Optimally Interpolated Sea Surface Salinity Global Dataset V2

    • s.cnmilf.com
    • podaac.jpl.nasa.gov
    • +4more
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/JPL/PODAAC;UHI/SOEST/IPRC (2025). Multi-Mission Optimally Interpolated Sea Surface Salinity Global Dataset V2 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/multi-mission-optimally-interpolated-sea-surface-salinity-global-dataset-v2-9337a
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This is a level 4 product on a 0.25-degree spatial and 4-day temporal grid. The product is derived from the level 2 swath data of three satellite missions: the Aquarius/SAC-D, Soil Moisture Active Passive (SMAP) and Soil Moisture and Ocean Salinity (SMOS) using Optimal Interpolation (OI) with a 7-day decorrelation time scale. The product offers a continuous record from August 28, 2011 to present by concatenating the measurements from Aquarius (September 2011 - June 2015) and SMAP (April 2015 present). ESAs SMOS data was used to fill the gap in SMAP data between June and July 2019, when the SMAP satellite was in a safe mode. The two-month overlap (April - June 2015) between Aquarius and SMAP was used to ensure consistency and continuity in data record. The product covers the global ocean, including the Arctic and Antarctic in the areas free of sea ice, but does not cover internal seas such as Mediterranean and Baltic Sea. In-situ salinity from Argo floats and moored buoys are used to derive a large-scale bias correction and to ensure consistency and accuracy of the OISSS dataset. This dataset is produced by the Earth and Space Research (ESR), Seattle, WA and the International Pacific Research Center (IPRC) of the University of Hawaii at Manoa in collaboration with the Remote Sensing Systems (RSS), Santa Rosa, California. More details can be found in the users guide.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Federal Transit Administration (2025). Monthly Modal Time Series [Dataset]. https://catalog.data.gov/dataset/monthly-modal-time-series

Monthly Modal Time Series

Explore at:
Dataset updated
Jun 6, 2025
Dataset provided by
Federal Transit Administration
Description

Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.

Search
Clear search
Close search
Google apps
Main menu