100+ datasets found

d
Monthly Modal Time Series
catalog.data.gov
data.transportation.gov
+1more
Updated Jun 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2025). Monthly Modal Time Series [Dataset]. https://catalog.data.gov/dataset/monthly-modal-time-series
Explore at:
Dataset updated
Jun 6, 2025
Dataset provided by
Federal Transit Administration
Description
Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.

TFH_Annotated_Dataset Dataset

paperswithcode.com

Updated Sep 6, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). TFH_Annotated_Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tfh-annotated-dataset

Explore at:

Dataset updated

Sep 6, 2022

Description

Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is 1.

The well-crafted information schema used for patent annotation contains 17 types of entities and 15 types of semantic relations as shown below.

Table 1 The specification of entity types

Type	Comment	example
physical flow	substance that flows freely	The etchant solution has a suitable solvent additive such as glycerol or methyl cellulose
information flow	information data	A camera using a film having a magnetic surface for recording magnetic data thereon
energy flow	entity relevant to energy	Conductor is utilized for producing writing flux in magnetic yoke
measurement	method of measuring something	The curing step takes place at the substrate temperature less than 200.degree
value	numerical amount	The curing step takes place at the substrate temperature less than 200.degree
location	place or position	The legs are thinner near the pole tip than in the back gap region
state	particular condition at a specific time	The MR elements are biased to operate in a magnetically unsaturated mode
effect	change caused an innovation	Magnetic disk system permits accurate alignment of magnetic head with spaced tracks
function	manufacturing technique or activity	A magnetic head having highly efficient write and read functions is thereby obtained
shape	the external form or outline of something	Recess is filled with non-magnetic material such as glass
component	a part or element of a machine	A pole face of yoke is adjacent edge of element remote from surface
attribution	a quality or feature of something	A pole face of yoke is adjacent edge of element remote from surface
consequence	The result caused by something or activity	This prevents the slider substrate from electrostatic damage
system	a set of things working together as a whole	A digital recording system utilizing a magnetoresistive transducer in a magnetic recording head
material	the matter from which a thing is made	Interlayer may comprise material such as Ta
scientific concept	terminology used in scientific theory	Peak intensity ratio represents an amount hydrophilic radical
other	Not belongs to the above entity types	Pressure distribution across air bearing surface is substantially symmetrical side

Table 2 The specification of relation types

TYPE	COMMENT	EXAMPLE
spatial relation	specify how one entity is located in relation to others	Gap spacer material is then deposited on the film knife-edge
part-of	the ownership between two entities	a magnetic head has a magnetoresistive element
causative relation	one entity operates as a cause of the other entity	Pressure pad carried another arm of spring urges film into contact with head
operation	specify the relation between an activity and its object	Heat treatment improves the (100) orientation
made-of	one entity is the material for making the other entity	The thin film head includes a substrate of electrically insulative material
instance-of	the relation between a class and its instance	At least one of the magnetic layer is a free layer
attribution	one entity is an attribution of the other entity	The thin film has very high heat resistance of remaining stable at 700.degree
generating	one entity generates another entity	Buffer layer resistor create impedance that noise introduced to head from disk of drive
purpose	relation between reason/result	conductor is utilized for producing writing flux in magnetic yoke
in-manner-of	do something in certain way	The linear array is angled at a skew angle
alias	one entity is also known under another entity’s name	The bias structure includes an antiferromagnetic layer AFM
formation	an entity acts as a role of the other entity	Windings are joined at end to form center tapped winding
comparison	compare one entity to the other	First end is closer to recording media use than second end
measurement	one entity acts as a way to measure the other entity	This provides a relative permeance of at least 1000
other	not belongs to the above types	Then, MR resistance estimate during polishing step is calculated from S value and K value

There are 1010 patent abstracts with 3,986 sentences in this corpus . We use a web-based annotation tool named Brat2 for data labeling, and the annotated data is saved in '.ann' format. The benefit of 'ann' is that you can display and manipulate the annotated data once the TFH_Annotated_Dataset.zip is unzipped under corresponding repository of Brat.

TFH_Annotated_Dataset contains 22,833 entity mentions and 17,412 semantic relation mentions. With TFH_Annotated_Dataset, we run two tasks of information extraction including named entity recognition with BiLSTM-CRF[3] and semantic relation extractionand with BiGRU-2ATTENTION[4]. For improving semantic representation of patent language, the word embeddings are trained with the abstract of 46,302 patents regarding magnetic head in hard disk drive, which turn out to improve the performance of named entity recognition by 0.3% and semantic relation extraction by about 2% in weighted average F1, compared to GloVe and the patent word embedding provided by Risch et al[5].

For named entity recognition, the weighted-average precision, recall, F1-value of BiLSTM-CRF on entity-level for the test set are 78.5%, 78.0%, and 78.2%, respectively. Although such performance is acceptable, it is still lower than its performance on general-purpose dataset by more than 10% in F1-value. The main reason is the limited amount of labeled dataset.

The precision, recall, and F1-value for each type of entity is shown in Fig. 4. As to relation extraction, the weighted-average precision, recall, F1-value of BiGRU-2ATTENTION for the test set are 89.7%, 87.9%, and 88.6% with no_edge relations, and 32.3%, 41.5%, 36.3% without no_edge relations.

Academic citing Chen, L., Xu, S*., Zhu, L. et al. A deep learning based method for extracting semantic information from patent documents. Scientometrics 125, 289–312 (2020). https://doi.org/10.1007/s11192-020-03634-y

Paper link https://link.springer.com/article/10.1007/s11192-020-03634-y

REFERENCE 1 Pérez-Pérez, M., Pérez-Rodríguez, G., Vazquez, M., Fdez-Riverola, F., Oyarzabal, J., Oyarzabal, J., Valencia,A., Lourenço, A., & Krallinger, M. (2017). Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: The CEMP and GPRO patents tracks. In Proceedings of the Bio-Creative V.5 challenge evaluation workshop, pp. 11–18.

2 Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J. I. (2012). BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102-107)

[3] Huang, Z., Xu, W., &Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991

[4] Han,X., Gao,T., Yao,Y., Ye,D., Liu,Z., Sun, M.(2019). OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. arXiv preprint arXiv: 1301.3781

[5] Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53(1), 108–122.

Z
Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...
data.niaid.nih.gov
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miha Mohorčič (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
Explore at:
Dataset updated
Jan 6, 2023
Dataset provided by
Andrej Hrovat
Aleš Simončič
Miha Mohorčič
Mihael Mohorčič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

Related dataset

Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

Measurement setup

The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

Data preprocessing

The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.

When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

where PR_data is structured as follows:

{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

Folder structure

For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

Environments description

The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

Known dataset shortcomings

Due to technical and physical limitations, the dataset contains some identified deficiencies.

PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

Location 1 - Piazza del Duomo - Chierici

The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

Location 2 - Via Etnea - Piazza del Duomo

The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

Location 3 - Via Etnea - Piazza Università

Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

Location 4 - Piazza Università

This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

Recognitions

The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
P
Data from: SLTrans Dataset
paperswithcode.com
huggingface.co
+1more
Updated Mar 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indraneil Paul; Goran Glavaš; Iryna Gurevych (2024). SLTrans Dataset [Dataset]. https://paperswithcode.com/dataset/sltrans
Explore at:
Dataset updated
Mar 5, 2024
Authors
Indraneil Paul; Goran Glavaš; Iryna Gurevych
Description
The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.
c
Ginga LAC Mode Catalog
s.cnmilf.com
catalog.data.gov
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
High Energy Astrophysics Science Archive Research Center (2025). Ginga LAC Mode Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ginga-lac-mode-catalog
Explore at:
Dataset updated
Apr 24, 2025
Dataset provided by
High Energy Astrophysics Science Archive Research Center
Description
The GINGAMODE database table contains selected information from the Large Area Counter (LAC) aboard the third Japanese X-ray astronomy satellite Ginga. The Ginga experiment began on day 36, 5 February 1987 and ended in November 1991. Ginga consisted of the LAC, the all-sky monitor (ASM) and the gamma-ray burst detector (GBD). The satellite was in a circular orbit at 31 degree inclination with apogee 670 km and perigee 510 km, and with a period of 96 minutes. A Ginga observation consisted of varying numbers of major frames which had lengths of 4, 32, or 128 seconds, depending on the setting of the bitrate. Each GINGAMODE database entry consists of data from the first record of a series of observations having the same values of the following: "BITRATE", "LACMODE", "DISCRIMINATOR", or "ACS MONITOR". When any of these changed, a new entry was written into GINGAMODE. The other Ginga catalog database, GINGALOG is also a subset of the same LAC dump file used to create GINGAMODE. GINGALOG contains a listing only whenever the "ACS monitor" (Attitude Control System) changes. Thus, GINGAMODE monitors changes in four parameters and GINGALOG is a basic log database mapping the individual FITS files. Ginga FITS files may have more than one entries in the GINGAMODE database. Both databases point to the same archived Flexible Image Transport System (FITS) files created from the LAC dump files. The user is invited to browse though the observations available from Ginga using GINGALOG or GINGAMODE, then extract the FITS files for more detailed analysis. The Ginga LAC Mode Catalog was prepared from data sent to NASA/GSFC from the Institute of Space and Astronautical Science (ISAS) in Japan.
Duplicate entries were removed from the HEASARC implementation of this catalog in June 2019. This is a service provided by NASA HEASARC .
Good Growth Plan 2014-2019 - Japan
microdata.worldbank.org
catalog.ihsn.org
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syngenta (2023). Good Growth Plan 2014-2019 - Japan [Dataset]. https://microdata.worldbank.org/index.php/catalog/5634
Explore at:
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Syngenta
Time period covered
2014 - 2019
Area covered
Japan
Description
Abstract

Syngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.

Geographic coverage

National coverage

Analysis unit

Agricultural holdings

Kind of data

Sample survey data [ssd]

Sampling procedure

A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.

B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).

C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.

BF Screened from Japan were selected based on the following criterion: Location: Hokkaido Tokachi (JA Memuro, JA Otofuke, JA Tokachi Shimizu, JA Obihiro Taisho) --> initially focus on Memuro, Otofuke, Tokachi Shimizu, Obihiro Taisho // Added locations in GGP 2015 due to change of RF: Obhiro, Kamikawa, Abashiri
BF: no use of in furrow application (Amigo) - no use of Amistar

Contract farmers of snacks and other food companies --> screening question: 'Do you have quality contracts in place with snack and food companies for your potato production? Y/N --> if no, screen out

Increase of marketable yield --> screening question: 'Are you interested in growing branded potatoes (premium potatoes for processing industry)? Y/N --> if no, screen out

Potato growers for process use
Background info: No mention of Syngenta Background info: - Labor cost is very serious issue: In general, labor cost in Japan is very high. Growers try to reduce labor cost by mechanization. Percentage of labor cost in production cost. They would like to manage cost of labor - Quality and yield driven

Mode of data collection

Face-to-face [f2f]

Research instrument

Data collection tool for 2019 covered the following information:

(A) PRE- HARVEST INFORMATION

PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment

(B) HARVEST INFORMATION

PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation

See all questionnaires in external materials tab

Cleaning operations

Data processing:

Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.

Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.

• Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.

• Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.

• Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.

• Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.

• Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.

• Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.

• It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.

Data appraisal

Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:

For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.
TMD Dataset - 5 seconds sliding window
kaggle.com
Updated Feb 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Schwartzer (2019). TMD Dataset - 5 seconds sliding window [Dataset]. https://www.kaggle.com/fschwartzer/tmd-dataset-5-seconds-sliding-window
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fernando Schwartzer
Description
Context

Identify user’s transportation modes through observations of the user, or observation of the environment, is a growing topic of research, with many applications in the field of Internet of Things (IoT). Transportation mode detection can provide context information useful to offer appropriate services based on user’s needs and possibilities of interaction.

Content

Initial data pre-processing phase: data cleaning operations are performed, such as delete measure from the sensors to exclude, make the values of the sound and speed sensors positive etc...

Furthermore some sensors, like ambiental (sound, light and pressure) and proximity, returns a single data value as the result of sense, this can be directly used in dataset. Instead, all the other return more than one values that are related to the coordinate system used, so their values are strongly related to orientation. For almost all we can use an orientation-independent metric, magnitude.

Acknowledgements

A sensor measures different physical quantities and provides corresponding raw sensor readings which are a source of information about the user and their environment. Due to advances in sensor technology, sensors are getting more powerful, cheaper and smaller in size. Almost all mobile phones currently include sensors that allow the capture of important context information. For this reason, one of the key sensors employed by context-aware applications is the mobile phone, that has become a central part of users lives.

Inspiration

User transportation mode recognition can be considered as a HAR task (Human Activity Recognition). Its goal is to identify which kind of transportation - walking, driving etc..- a person is using. Transportation mode recognition can provide context information to enhance applications and provide a better user experience, it can be crucial for many different applications, such as device profiling, monitoring road and traffic condition, Healthcare, Traveling support etc..

Original dataset from: Carpineti C., Lomonaco V., Bedogni L., Di Felice M., Bononi L., "Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity", in Proceedings of the 14th Workshop on Context and Activity Modeling and Recognition (IEEE COMOREA 2018), Athens, Greece, March 19-23, 2018 [Pre-print available]
d
GLO climate data stats summary
data.gov.au
researchdata.edu.au
+2more
zip
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
Explore at:
zip(8810)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

Time series mean annual BAWAP rainfall from 1900 - 2012.

Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

There are 4 csv files here:

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset History

Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset Citation

Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

Dataset Ancestors

Derived From Natural Resource Management (NRM) Regions 2010

Derived From Bioregional Assessment areas v03

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
A geometric shape regularity effect in the human brain: fMRI dataset
openneuro.org
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene (2025). A geometric shape regularity effect in the human brain: fMRI dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006010.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds006010.v1.0.1
Dataset updated
Mar 14, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A geometric shape regularity effect in the human brain: fMRI dataset

Authors:

Mathias Sablé-Meyer*

Lucas Benjamin

Cassandra Potier Watkins

Chenxi He

Maxence Pajot

Théo Morfoisse

Fosca Al Roumi

Stanislas Dehaene

*Corresponding author: mathias.sable-meyer@ucl.ac.uk

Abstract

The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.

Notes about this dataset

We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the fMRI dataset of N=20 adult participants (sub-2xx, numbers between 204 and 223), and N=22 children (sub-3xx, numbers between 301 and 325).

The code for the analyses is provided at https://github.com/mathias-sm/AGeometricShapeRegularityEffectHumanBrain
However, the analyses work from already preprocessed data. Since there is no custom code per se for the preprocessing, I have not included it in the repository. To preprocess the data as was done in the published article, here is the command and software information:

fMRIPrep version: 20.0.5

fMRIPrep command: /usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2

Defacing has been performed with bidsonym running the pydeface masking, and nobrainer brain registraction pipeline.
The published analyses have been performed on the non-defaced data. I have checked for data quality on all participants after defacing. In specific cases, I may be able to request the permission to share the original, non-defaced dataset.

sub-325 was acquired by a different experimenter and defaced before being shared with the rest of the research team, hence why the slightly different defacing mask. That participant was also preprocessed separately, and using a more recent fMRIPrep version: 20.2.6.

The data associated with the children has a few missing files. Notably:

sub-313 and sub-316 are missing one run of the localizer each

sub-316 has no data at all for the geometry

sub-308 has eno useable data for the intruder task Since all of these still have some data to contribute to either task, all available files were kept on this dataset. The analysis code reflects these inconsistencies where required with specific exceptions.
Z
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
data.niaid.nih.gov
zenodo.org
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mais, Lisa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
Explore at:
Dataset updated
Apr 2, 2024
Dataset provided by
Hirsch, Peter
Rumberger, Josef Lorenz
Kainmueller, Dagmar
Kandarpa, Ramya
Reinke, Annika
Ihrke, Gudrun
Managan, Claire
Mais, Lisa
Maier-Hein, Lena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

How to open zarr files

Install the python zarr package:

pip install zarr

Opened a zarr file with:

import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

optional:import numpy as npraw_np = np.array(raw)

Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:

pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

Execute:

python view_data.py /R9F03-20181030_62_B5.zarr

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
BioTISR: a time-lapse biological image dataset for super-resolution...
zenodo.org
bin
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Qiao; Chang Qiao; Wencong Xu; Wencong Xu (2024). BioTISR: a time-lapse biological image dataset for super-resolution microscopy [Dataset]. http://doi.org/10.5281/zenodo.13984825
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13984825
Dataset updated
Nov 7, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chang Qiao; Chang Qiao; Wencong Xu; Wencong Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BioTISR is a biological image dataset for super-resolution microscopy, currently including 2D and 3D time-lapse image pairs of low-and-high resolution images of a variety of biology structures, aiming to provide a high-quality dataset of time-lapse biological SR images for the community to spark more developments of computational SR methods.

At present, 2D dataset includes five specimens (clathrin-coated pits, lysosomes, outer mitochondrial membrane, microtubules, and F-actin) acquired with the GI/TIRF-SIM mode and nonlinear SIM mode of our Multi-SIM system, and 3D dataset includes three specimens (outer mitochondrial membrane, microtubules, and F-actin) acquired with 3D-SIM mode of the Multi-SIM system. For each type of specimen and each imaging modality, we acquired the raw data from at least 50 distinct regions-of-interest (ROI). For each ROI, we acquired two (3D data) or three (2D data) groups of N-phase × M-orientation × T-timepoint raw images with a constant exposure time but increasing the excitation light intensity, where (N, M, T) are (3, 3, 20) for TIRF-SIM and GI-SIM, (5, 5, 10) for nonlinear SIM, and (3, 5, 10) for 3D-SIM. Specific imaging conditions and scripts for reading MRC file are provided in Supplement Files.

The BioTISR dataset is related to the following paper:Chang Qiao, Shuran Liu, Yuwang Wang, Wencong Xu, et al. "Time-lapse Image Super-resolution Neural Network with Reliable Confidence Evaluation for Optical Microscopy." bioRxiv 2024.05.04.592503 (2024), which is an extension of our previously published BioSR dataset (https://www.nature.com/articles/s41592-020-01048-5).

Limited by quota, the original images uploaded in the current 3D dataset are wide-field images obtained after averaging 15 images, where (N, M, T) are (1, 1, 10). We will update them to raw SIM images after the quota is expanded.

2D dataset's url:

https://doi.org/10.5281/zenodo.13843670

3D dataset's urls:

F-actin:

WF input: https://doi.org/10.5281/zenodo.13843673

Raw SIM input:https://doi.org/10.5281/zenodo.13994464

Microtubules:

WF input: https://doi.org/10.5281/zenodo.13932988

Raw SIM input: https://doi.org/10.5281/zenodo.13989327

Mitochondria:

WF input: https://doi.org/10.5281/zenodo.13843183

Raw SIM input: https://doi.org/10.5281/zenodo.14000502
Z
Data from: 359,569 commits with source code density; 1149 commits of which...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hönel, Sebastian (2020). 359,569 commits with source code density; 1149 commits of which have software maintenance activity labels (adaptive, corrective, perfective) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2590518
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Hönel, Sebastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comes as SQL-importable file and is compatible with the widely available MariaDB- and MySQL-databases.

It is based on (and incorporates/extends) the dataset "1151 commits with software maintenance activity labels (corrective,perfective,adaptive)" by Levin and Yehudai (https://doi.org/10.5281/zenodo.835534).

The extensions to this dataset were obtained using Git-Tools, a tool that is included in the Git-Density (https://doi.org/10.5281/zenodo.2565238) suite. For each of the projects in the original dataset, Git-Tools was run in extended mode.

The dataset contains these tables:

x1151: The original dataset from Levin and Yehudai.

despite its name, this dataset has only 1,149 commits, as two commits were duplicates in the original dataset.

This dataset spanned 11 projects, each of which had between 99 and 114 commits

This dataset has 71 features and spans the projects RxJava, hbase, elasticsearch, intellij-community, hadoop, drools, Kotlin, restlet-framework-java, orientdb, camel and spring-framework.

gtools_ex (short for Git-Tools, extended)

Contains 359,569 commits, analyzed using Git-Tools in extended mode

It spans all commits and projects from the x1151 dataset as well.

All 11 projects were analyzed, from the initial commit until the end of January 2019. For the projects Intellij and Kotlin, the first 35,000 resp. 30,000 commits were analyzed.

This dataset introduces 35 new features (see list below), 22 of which are size- or density-related.

The dataset contains these views:

geX_L (short for Git-tools, extended, with labels)

Joins the commits' labels from x1151 with the extended attributes from gtools_ex, using the commits' hashes.

jeX_L (short for joined, extended, with labels)

Joins the datasets x1151 and gtools_ex entirely, based on the commits' hashes.

Features of the gtools_ex dataset:

SHA1

RepoPathOrUrl

AuthorName

CommitterName

AuthorTime (UTC)

CommitterTime (UTC)

MinutesSincePreviousCommit: Double, describing the amount of minutes that passed since the previous commit. Previous refers to the parent commit, not the previous in time.

Message: The commit's message/comment

AuthorEmail

CommitterEmail

AuthorNominalLabel: All authors of a repository are analyzed and merged by Git-Density using some heuristic, even if they do not always use the same email address or name. This label is a unique string that helps identifying the same author across commits, even if the author did not always use the exact same identity.

CommitterNominalLabel: The same as AuthorNominalLabel, but for the committer this time.

IsInitialCommit: A boolean indicating, whether a commit is preceded by a parent or not.

IsMergeCommit: A boolean indicating whether a commit has more than one parent.

NumberOfParentCommits

ParentCommitSHA1s: A comma-concatenated string of the parents' SHA1 IDs

NumberOfFilesAdded

NumberOfFilesAddedNet: Like the previous property, but if the net-size of all changes of an added file is zero (i.e. when adding a file that is empty/whitespace or does not contain code), then this property does not count the file.

NumberOfLinesAddedByAddedFiles

NumberOfLinesAddedByAddedFilesNet: Like the previous property, but counts the net-lines

NumberOfFilesDeleted

NumberOfFilesDeletedNet: Like the previous property, but considers only files that had net-changes

NumberOfLinesDeletedByDeletedFiles

NumberOfLinesDeletedByDeletedFilesNet: Like the previous property, but counts the net-lines

NumberOfFilesModified

NumberOfFilesModifiedNet: Like the previous property, but considers only files that had net-changes

NumberOfFilesRenamed

NumberOfFilesRenamedNet: Like the previous property, but considers only files that had net-changes

NumberOfLinesAddedByModifiedFiles

NumberOfLinesAddedByModifiedFilesNet: Like the previous property, but counts the net-lines

NumberOfLinesDeletedByModifiedFiles

NumberOfLinesDeletedByModifiedFilesNet: Like the previous property, but counts the net-lines

NumberOfLinesAddedByRenamedFiles

NumberOfLinesAddedByRenamedFilesNet: Like the previous property, but counts the net-lines

NumberOfLinesDeletedByRenamedFiles

NumberOfLinesDeletedByRenamedFilesNet: Like the previous property, but counts the net-lines

Density: The ratio between the two sums of all lines added+deleted+modified+renamed and their resp. gross-version. A density of zero means that the sum of net-lines is zero (i.e. all lines changes were just whitespace, comments etc.). A density of of 1 means that all changed net-lines contribute to the gross-size of the commit (i.e. no useless lines with e.g. only comments or whitespace).

AffectedFilesRatioNet: The ratio between the sums of NumberOfFilesXXX and NumberOfFilesXXXNet

This dataset is supporting the paper "Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities", as submitted to the QRS2019 conference (The 19th IEEE International Conference on Software Quality, Reliability, and Security). Citation: Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2019. Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities. In The 19th IEEE International Conference on Software Quality, Reliability, and Security.
w
Dataset of book subjects that contain Multi-step multi-input one-way quantum...
workwithdata.com
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Multi-step multi-input one-way quantum information processing with spatial and temporal modes of light [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Multi-step+multi-input+one-way+quantum+information+processing+with+spatial+and+temporal+modes+of+light&j=1&j0=books
Explore at:
Dataset updated
Aug 10, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is Multi-step multi-input one-way quantum information processing with spatial and temporal modes of light. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
🎹 Spotify Tracks Dataset
kaggle.com
Updated Oct 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MaharshiPandya (2022). 🎹 Spotify Tracks Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/4372070
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4372070
Dataset updated
Oct 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MaharshiPandya
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Content

This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

Usage

The dataset can be used for:

Building a Recommendation System based on some user input or preference

Classification purposes based on audio features and available genres

Any other application that you can think of. Feel free to discuss!

Column Description

track_id: The Spotify ID for the track

artists: The artists' names who performed the track. If there is more than one artist, they are separated by a ;

album_name: The album name in which the track appears

track_name: Name of the track

popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.

duration_ms: The track length in milliseconds

explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)

danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable

energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale

key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1

loudness: The overall loudness of a track in decibels (dB)

mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0

speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks

acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic

instrumentalness: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content

liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live

valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)

tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration

time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.

track_genre: The genre in which the track belongs

Acknowledgement

Image credits: BPR world
COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam
microdata.worldbank.org
catalog.ihsn.org
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2023). COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/4061
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
World Bankhttp://worldbank.org/
Time period covered
2020
Area covered
Vietnam
Description
Geographic coverage

National, regional

Analysis unit

Households

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2020 Vietnam COVID-19 High Frequency Phone Survey of Households (VHFPS) uses a nationally representative household survey from 2018 as the sampling frame. The 2018 baseline survey includes 46,980 households from 3132 communes (about 25% of total communes in Vietnam). In each commune, one EA is randomly selected and then 15 households are randomly selected in each EA for interview. We use the large module of to select the households for official interview of the VHFPS survey and the small module households as reserve for replacement. After data processing, the final sample size for Round 2 is 3,935 households.

Mode of data collection

Computer Assisted Telephone Interview [cati]

Research instrument

The questionnaire for Round 2 consisted of the following sections

Section 2. Behavior Section 3. Health Section 5. Employment (main respondent) Section 6. Coping Section 7. Safety Nets Section 8. FIES

Cleaning operations

Data cleaning began during the data collection process. Inputs for the cleaning process include available interviewers’ note following each question item, interviewers’ note at the end of the tablet form as well as supervisors’ note during monitoring. The data cleaning process was conducted in following steps: • Append households interviewed in ethnic minority languages with the main dataset interviewed in Vietnamese. • Remove unnecessary variables which were automatically calculated by SurveyCTO • Remove household duplicates in the dataset where the same form is submitted more than once. • Remove observations of households which were not supposed to be interviewed following the identified replacement procedure. • Format variables as their object type (string, integer, decimal, etc.) • Read through interviewers’ note and make adjustment accordingly. During interviews, whenever interviewers find it difficult to choose a correct code, they are recommended to choose the most appropriate one and write down respondents’ answer in detail so that the survey management team will justify and make a decision which code is best suitable for such answer. • Correct data based on supervisors’ note where enumerators entered wrong code. • Recode answer option “Other, please specify”. This option is usually followed by a blank line allowing enumerators to type or write texts to specify the answer. The data cleaning team checked thoroughly this type of answers to decide whether each answer needed recoding into one of the available categories or just keep the answer originally recorded. In some cases, that answer could be assigned a completely new code if it appeared many times in the survey dataset.
• Examine data accuracy of outlier values, defined as values that lie outside both 5th and 95th percentiles, by listening to interview recordings. • Final check on matching main dataset with different sections, where information is asked on individual level, are kept in separate data files and in long form. • Label variables using the full question text. • Label variable values where necessary.
Power Transformers FDD and RUL
kaggle.com
zip
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iurii Katser (2024). Power Transformers FDD and RUL [Dataset]. https://www.kaggle.com/datasets/yuriykatser/power-transformers-fdd-and-rul
Explore at:
zip(33405750 bytes)Available download formats
Dataset updated
Sep 1, 2024
Authors
Iurii Katser
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Datasets with dissolved gases concentrations in power transformer oil for remaining useful life (RUL), fault detection and diagnosis (FDD) problems.

Introduction

Power transformers (PTs) are an important component of a nuclear power plant (NPP). They convert alternating voltage and are instrumental in power supply of both external NPP energy consumers and NPPs themselves. Currently, many PTs have exceeded planned service life that had been extended over the designated 25 years. Due to the extension, monitoring the PT technical condition becomes an urgent matter.

An important method for monitoring and diagnosing PTs is Chromatographic Analysis of Dissolved Gas (CADG). It is based on the principle of forced extraction and analysis of dissolved gases from PT oil. Almost all types of equipment defects are accompanied by formation of gases that dissolve in oil; certain types of defects generate certain gases in different quantities. The concentrations also differ on various stages of defects developing that allows to calculate RUL of the PT. At present, NPP control and diagnostic systems for PT equipment use predefined control limits for concentration of dissolved gases in oil. The main disadvantages of this approach are the lack of automatic control and insufficient quality of diagnostics, especially for PTs with extended service life. To combat these shortcomings in diagnostic systems for the analysis of data obtained using CADG, machine learning (ML) methods can be used, as they are used in diagnostics of many NNP components.

Data description

The datasets are available as .csv files containing 420 records of gas concentration, presented as a time dependence. The gasses are 𝐻2, 𝐶𝑂, 𝐶2𝐻4 и 𝐶2𝐻2. The period between time points is 12 hours. There are 3000 datasets splitted into train (2100 datasets) and test (900 datasets) sets.

For RUL problem, annotations are available (in the separate files): each .csv file corresponds to a value in points that is equal the time remaining until the equipment fails, at the end of record.

For FDD problems, there are labels (in the separate files) with four PT operating modes (classes): 1. Normal mode (2436 datasets); 2. Partial discharge: local dielectric breakdown in gas-filled cavities (127 datasets); 3. Low energy discharge: sparking or arc discharges in poor contact connections of structural elements with different or floating potential; discharges between PT core structural elements, high voltage winding taps and the tank, high voltage winding and grounding; discharges in oil during contact switching (162 datasets); 4. Low-temperature overheating: oil flow disruption in windings cooling channels, magnetic system causing low efficiency of the cooling system for temperatures < 300 °C (275 datasets).

Data in this repository is an extension (test set added) of data from here and here.

FDD problems statement

In our case, the fault detection problem transforms into a classification problem, since the data is related to one of four labeled classes (including one normal and three anomalous), so the model’s output needs to be a class number. The problem can be stated as binary classification (healthy/anomalous) for fault detection or multi class classification (on of 4 states) for fault diagnosis.

RUL problem statement

To ensure high-quality maintenance and repair, it is vital to be aware of potential malfunctions and predict RUL of transformer equipment. Therefore, it is necessary to create a mathematical model that will determine RUL by the final 420 points.

Data usage examples

Dataset was used in this article.

Dataset was used in this research by Katser et.al. that solves the problem proposing ensemble of classifiers.
d
HIRENASD Comparisons of FEM modal frequencies and modeshapes
catalog.data.gov
data.wu.ac.at
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). HIRENASD Comparisons of FEM modal frequencies and modeshapes [Dataset]. https://catalog.data.gov/dataset/hirenasd-comparisons-of-fem-modal-frequencies-and-modeshapes
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Below are frequency comparisons of different models with experiment Note Modeshapes aren't very descriptive for higher modes. There is coupling between them so this is just an approximate naming scheme. See modeshape plots for more details. PDF files are provided with figures of the modeshapes for selected FEM TET10 model (Nov 2011) (CASE 10) Hex8 Modeshapes (CASE 4) TET10 no modelcart (CASE 5) HIRENASD TET model with modelcart - new OML HIRENASD HEX 8 Wing only model Mode 1 Mode 1 Mode 2 Mode 2 Mode 3 Mode 3 Mode 4 Mode 4 Mode 5 Mode 5 Mode 6 Mode 6 Mode 7 Mode 7 Mode 8 Mode 8 Mode 9 Mode 9 Mode 10 Mode 10 Mode 11 Mode 12
m
Graphite//LFP synthetic duty cycle dataset
data.mendeley.com
narcis.nl
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Dubarry (2021). Graphite//LFP synthetic duty cycle dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.2
Explore at:
Unique identifier
https://doi.org/10.17632/6s6ph9n8zg.2
Dataset updated
Mar 12, 2021
Authors
Matthieu Dubarry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details

For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

%degradation=a × cycle+ (exp^(b×cycle)-1) (1)

Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

%LLI=%LLI+p8 (LAM_PE-PT) (2)

Where PT was calculated with equation (3) from [60].

PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

Varying all those parameters accounted for close to 130,000 individual duty cycles with one voltage curve for every 200 cycles (one every 10 cycles for cycles 1-200)

This dataset necessitate the associated V vs. Q dataset to be functional (10.17632/bs2j56pn7y.2) See read me for details on variable and examples.
CATS-ISS Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer - Dataset...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). CATS-ISS Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/cats-iss-level-2-operational-day-mode-7-2-version-3-01-5-km-layer-3266c
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
CATS-ISS_L2O_D-M7.2-V3-01_05kmLay is the Cloud-Aerosol Transport System (CATS) International Space Station (ISS) Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer data product. This collection spans from March 25, 2015 to October 29, 2017. CATS, which was launched on January 10, 2015, was a lidar remote sensing instrument that provided range-resolved profile measurements of atmospheric aerosols and clouds from the ISS. CATS was intended to operate on-orbit for up to three years. CATS provides vertical profiles at three wavelengths, orbiting between ~230 and ~270 miles above the Earth's surface at a 51-degree inclination with nearly a three-day repeat cycle. For the first time, scientists were able to study diurnal (day-to-night) changes in cloud and aerosol effects from space by observing the same spot on Earth at different times each day. CATS Level 2 Layer data products contain geophysical parameters and are derived from Level 1 data, at 60m vertical and 5km horizontal resolution.
Multi-Mission Optimally Interpolated Sea Surface Salinity Global Dataset V2
s.cnmilf.com
podaac.jpl.nasa.gov
+4more
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/JPL/PODAAC;UHI/SOEST/IPRC (2025). Multi-Mission Optimally Interpolated Sea Surface Salinity Global Dataset V2 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/multi-mission-optimally-interpolated-sea-surface-salinity-global-dataset-v2-9337a
Explore at:
Dataset updated
Apr 24, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This is a level 4 product on a 0.25-degree spatial and 4-day temporal grid. The product is derived from the level 2 swath data of three satellite missions: the Aquarius/SAC-D, Soil Moisture Active Passive (SMAP) and Soil Moisture and Ocean Salinity (SMOS) using Optimal Interpolation (OI) with a 7-day decorrelation time scale. The product offers a continuous record from August 28, 2011 to present by concatenating the measurements from Aquarius (September 2011 - June 2015) and SMAP (April 2015 present). ESAs SMOS data was used to fill the gap in SMAP data between June and July 2019, when the SMAP satellite was in a safe mode. The two-month overlap (April - June 2015) between Aquarius and SMAP was used to ensure consistency and continuity in data record. The product covers the global ocean, including the Arctic and Antarctic in the areas free of sea ice, but does not cover internal seas such as Mediterranean and Baltic Sea. In-situ salinity from Argo floats and moored buoys are used to derive a large-scale bias correction and to ensure consistency and accuracy of the OISSS dataset. This dataset is produced by the Earth and Space Research (ESR), Seattle, WA and the International Pacific Research Center (IPRC) of the University of Hawaii at Manoa in collaboration with the Remote Sensing Systems (RSS), Santa Rosa, California. More details can be found in the users guide.

Facebook

Twitter

Click to copy link

Link copied

Cite

Federal Transit Administration (2025). Monthly Modal Time Series [Dataset]. https://catalog.data.gov/dataset/monthly-modal-time-series

Monthly Modal Time Series

Explore at:

Dataset updated

Jun 6, 2025

Dataset provided by

Federal Transit Administration

Description

Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.

Clear search

Close search

Google apps

Main menu

Monthly Modal Time Series

TFH_Annotated_Dataset Dataset

Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

Data from: SLTrans Dataset

Ginga LAC Mode Catalog

Good Growth Plan 2014-2019 - Japan

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Data appraisal

TMD Dataset - 5 seconds sliding window

Context

Content

Acknowledgements

Inspiration

GLO climate data stats summary

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

A geometric shape regularity effect in the human brain: fMRI dataset

A geometric shape regularity effect in the human brain: fMRI dataset

Abstract

Notes about this dataset

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

optional:import numpy as npraw_np = np.array(raw)

BioTISR: a time-lapse biological image dataset for super-resolution...

Data from: 359,569 commits with source code density; 1149 commits of which...

Dataset of book subjects that contain Multi-step multi-input one-way quantum...

🎹 Spotify Tracks Dataset

Content

Usage

Column Description

Acknowledgement

COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Power Transformers FDD and RUL

Introduction

Data description

FDD problems statement

RUL problem statement

Data usage examples

HIRENASD Comparisons of FEM modal frequencies and modeshapes

Graphite//LFP synthetic duty cycle dataset

CATS-ISS Level 2 Operational Day Mode 7.2 Version 3-01 5 km Layer - Dataset...

Multi-Mission Optimally Interpolated Sea Surface Salinity Global Dataset V2

Monthly Modal Time SeriesSee More Versions

Monthly Modal Time Series