The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application _domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Introduction
The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.
The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.
Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.
At a Glance
License
The ComplexVAD dataset is released under CC-BY-SA-4.0 license.
All data:
Created by Mitsubishi Electric Research Laboratories (MERL), 2024
SPDX-License-Identifier: CC-BY-SA-4.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
The 30-min x 30-min Terrestrial Mean Free-Air Gravity Anomaly and Geoid Undulations Data Base was compiled and developed by the Ohio State University. This data base was received in March 1993. Principal gravity parameters include mean elevation, mean free-air anomaly and source id. The gravity anomaly computation uses the Geodetic Reference System 1967 (GRS 67) theoretical formula. The data are global in coverage where data are available.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Introduction
The Street Scene dataset consists of 46 training video sequences and 35 testing video sequences taken from a static USB camera looking down on a scene of a two-lane street with bike lanes and pedestrian sidewalks. Videos were collected from the camera at various times during two consecutive summers. All of the videos were taken during the daytime. The dataset is challenging because of the variety of activities taking place such as cars driving, turning, stopping and parking; pedestrians walking, jogging and pushing strollers; and bikers riding in bike lanes. In addition, the videos contain changing shadows, and moving background such as a flag and trees blowing in the wind.
There are a total of 202,545 color video frames (56,135 for training and 146,410 for testing) each of size 1280 x 720 pixels. The frames were extracted from the original videos at 15 frames per second.
The 35 testing sequences have a total of 205 anomalous events consisting of 17 different anomaly types. A complete list of anomaly types and the number of each in the test set can be found in our paper.
Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. Track lengths vary from tens of frames to 5200 which is the length of the longest testing sequence. A single frame can have more than one anomaly labeled.
NOTE: This version of the dataset differs slightly with the original made available in 2020. Some anomalies were found in a few of the normal training sequences. These training frames were deleted from the dataset. Specifically, the following frames were removed:
Train026: frames 1-184 (car taking a u-turn)
Train027: frames 1-229 (jay walkers)
Train031: frames 1-299 (jay walkers, illegally parked car)
At a Glance
Other Resources
None
Citation
If you use the Street Scene dataset in your research, please cite our contribution:
@inproceedings{ramachandra2020street,
title={Street Scene: A new dataset and evaluation protocol for video anomaly detection},
author={Ramachandra, Bharathkumar and Jones, Michael},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2569--2578},
year={2020}
}
License
The Street Scene dataset is released under CC-BY-SA-4.0 license.
All data:
Created by Mitsubishi Electric Research Laboratories (MERL), 2023
SPDX-License-Identifier: CC-BY-SA-4.0
This accession contains the gridded 5-day mean sea surface height anomaly (SSHA) and Ku Band significant wave height (SWH-KU) observed from Jason-1 and OSTM/Jason-2 satellites. This dataset has been generated in near-real time during the Deepwater Horizon Oil Spill Event in order to provide users a quick view on the sea level change and wave activity in the the Gulf of Mexico though it is under a global coverage. All of the observations within 5 days are interpolated into 0.25 longitude degree and 0.25 latitude degree uniform grid.
The OSTM/Jason-2 observations are select Interim Geophysical Data Records (IGDR) which are archived under NODC Accession 0043269 at https://accession.nodc.noaa.gov/43269. The Jason-1 satellite IGDR data are from the JPL/NASA PO.DAAC, from ftp://podaac.jpl.nasa.gov/pub/sea_surface_height/jason/igdr_ssha_netcdf/data/.
Above original data have been processed, gridded, visualized and finally converted into NetCDF format by scientists in NODC's satellite oceanography group. The data time period in this accession has been updated to August 8, 2010 in the latest version.
The 1x1 degree Terrestrial Mean Free-Air Gravity Anomaly and Geoid Undulations Data Base was compiled and developed by the Ohio State University. This data base was received in March 1993. Principal gravity parameters include mean elevation, mean free-air anomaly and source id. The gravity anomaly computation uses the Geodetic Reference System 1967 (GRS 67) theoretical formula. The data are global in coverage where data are available.
This data file contains global-mean thermosteric sea level anomalies and associated errors between 1950 and 2003, based on the enact 3 data set with Wijffels et.al. (2008) fall rate corrections applied to the eXpendable BathyThermograph (XBT). There are two files, one with yearly averages, the other with three-year running means. The Units are millimetres. For the errors, One-sigma error in the same units as the variables, and the period is 1950 to 2003, relative to 1961 (zero-crossing), the yearly averages are averages over a calendar year. The time in the file is the centre of the year averaged over. Three-year means are also centred on the time shown. and the depth-integrations are 0-100 m, 0-300 m, and 0-700 m. The three-year running means are as plotted in Domingues et.al. (2008), and are recommended as being the best to use.
This data set provides gridded daily global estimates of sea level anomaly based on satellite altimetry measurements. Sea level anomaly is defined as the height of water over the mean sea surface in a given time and region. In this dataset sea level anomalies are computed with respect to a twenty-year mean reference period (1993-2012) using up-to-date altimeter standards.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Gridded monthly, seasonal and annual mean temperature anomalies derived from daily minimum, maximum and mean surface air temperatures (degrees Celsius) is available at a 50km resolution across Canada. The Canadian gridded data (CANGRD) are interpolated from homogenized temperature (i.e., AHCCD datasets). Homogenized temperatures incorporate adjustments to the original station data to account for discontinuities from non-climatic factors, such as instrument changes or station relocation. The anomalies are the difference between the temperature for a given year or season and a baseline value (defined as the average over 1961-1990 as the reference period). The yearly and seasonal temperature anomalies were computed for the years 1948 to 2017. The data will continue to be updated every year.
Gridded seasonal mean temperature anomalies derived from daily minimum, maximum and mean surface air temperatures (degrees Celsius) and anomalies derived from daily total precipitation is available at a 50km resolution across Canada. The Canadian gridded data (CANGRD) are interpolated from homogenized temperature (i.e., AHCCD datasets). Homogenized temperatures incorporate adjustments to the original station data to account for discontinuities from non-climatic factors, such as instrument changes or station relocation. The anomalies are the difference between the temperature for a given year or season and a baseline value (defined as the average over 1961-1990 as the reference period). The yearly and seasonal temperature anomalies were computed for the years 1948 to 2017. The data will continue to be updated every year. For precipitation, the Canadian gridded data (CANGRD) are interpolated from adjusted precipitation (i.e., AHCCD datasets). Adjusted precipitation data incorporate adjustments to the original station data to account for discontinuities from non-climatic factors, such as instrument changes or station relocation. The anomalies are the percentage difference between the value for a given year or season and a baseline value (defined as the average over 1961-1990 as the reference period). The yearly and seasonal relative precipitation anomalies were computed for the years 1948 to 2014. The data will be updated as time permits.
The dataset tracks how global temperatures have changed over time relative to the 1951-1980 reference period. Source: GISS Illustration by: Brendan Lynch/Axios
An additional sub-indicator will be provided to evaluate the intra-annual changes in chlorophyll-a concentration anomalies in each Exclusive Economic Zone (EEZ) using the NOAA VIIRS chlorophyll-a ratio anomaly product produced daily for the globe at 2 km spatial resolution. The daily global VIIRS chlorophyll-a concentrations are produced from the NOAA Multi-Sensor Level 1 to Level 2 (MSL12) processing of the VIIRS sensor on the Suomi SNPP satellite. [Wang et al., 2017; Wang et al.,2014] This anomaly product is defined as the daily chlorophyll-a concentration subtracted from a rolling 61-day mean baseline with a 15 day lag (based on Stumpf et al., 2003), then normalized to the rolling 61 day mean to create the proportional difference anomaly. The processing steps are outlined below.1. Classify and count pixels as moderate, high or extreme anomalies.For each day in the reporting year, pixels in the global EEZ area (as defined in World EEZ v11) that are classified as:moderate (in the 90th percentile),high (in the 95th percentile) andextreme (in the 99th percentiles).The number of days a given pixel is classified as moderate, high, or extreme within each month is then calculated. The number of days in each month where the pixel has valid data is also counted.2. Calculating the monthly statisticsBecause these anomalies are based on daily observations, data gaps are expected due to cloud cover, sun glint, high sensor zenith angle, high sun zenith angle, and other possible algorithm flags. To avoid bias due to non-valid data retrievals, the frequencies are normalized using the number of days in the month with valid observations, as follows:Relative frequency of classified pixel chlorophyll-a anomalies= αc / εWhere α = the number of days in the month with a classified (moderate, high or extreme anomaly) anomalyWhere c indicates the anomaly classification (moderate, high, or extreme anomaly)Where ε = the number of days in the month with valid dataFinally, the monthly mean of the relative frequencies for each class is calculated for each EEZ, resulting in 3 monthly values, one value in each of the 3 classes for each country. For more information see the Global Manual of Ocean Statistics Available on the UNEP Document Repository (pages 19-24).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The Moisture Anomaly Index (Palmer-Z) is an estimate of the moisture difference from normal (a 30-year mean). It attempts to express conditions for the current month regardless of what may have occurred before the month in question.
These global and hemispheric temperature anomaly time series, which incorporate land and marine data, are continually updated and expanded by P. Jones of the Climatic Research Unit (CRU) with help from colleagues at the CRU and other institutions. Some of the earliest work in producing these temperature series dates back to Jones et al. (1986a,b,c), Jones (1988, 1994), and Jones and Briffa (1992). Most of the discussion of methods given here has been gleaned from the Frequently Asked Questions section of the CRU temperature data web pages. Users are encouraged to visit the CRU Web site for the most comprehensive overview of these data (the "HadCRUT4" dataset), other associated datasets, and the most recent literature references to the work of Jones et al. For access to the data files, click this link to the CDIAC data transition website: http://cdiac.ornl.gov/trends/temp/jonescru/jones.html
Gridded (adjusted) sea level anomaly (GSLA), gridded sea level (GSL) and surface geostrophic velocity (UCUR,VCUR) for the Australasian region. GSLA is mapped using optimal interpolation of detided, de-meaned, inverse-barometer-adjusted altimeter and tidegauge estimates of sea level. GSL is GSLA plus an estimate of the departure of mean sea level from the geoid – mean sea level (over 18 years of model time) of Ocean Forecasting Australia Model version 3 (OFAM3). The geostrophic velocities are derived from GSLA and the mean surface velocity from OFAM3. The altimeter data window for input to the Delayed Mode (DM) maps is symmetrical about the map date. The width of the window is dependent on the number of altimeters flying at the time and ranges from 30 days to 14 days. The altimeter data window for Near Real Time (NRT) maps is asymmetrical about the analysis date (-21 to 4 days). For both NRT and DM, altimeter data is weighted by the difference between the analysis date of the map and the time of each altimeter observation.References: http://imos.aodn.org.au/oceancurrent
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results represent experiments on four datasets based on 20 simulated experiments. The proposed method (NewAlgo) produces the best overall results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data has been gathered and analyzed by Massimo Schembri during his internship at University of Bologna, with the supervision of Andrea Borghesi (assistant professor at the same university).
The data was collected from a monitored supercomputer hosted at CINECA and called "Marconi"; the data was collected with a holistic data monitoring infrastructure called Examon, developed by researchers from the University of Bologna with the collaborations of CINECA system administrators.
The data set is composed of two monitored periods January and May 2020; for these two periods, there is data relating to a subset of the nodes in Marconi supercomputer.
The information monitored on Marconi's nodes is varied, ranging from the load of the different cores, to the temperature of the room where the nodes are located, the speed of the fans, details on memory accesses in writing / reading, etc. The sampling rate of the data at the source varies between 5 and 10 seconds. However, in your data set the data are aggregated in 5-minutes intervals; in particular, the mean value ("avg: ") and variance ("var: ") are computed over each 5-minute interval.
In the CSVs, each row corresponds to a different timestamp (first column on the left), therefore separated by intervals of 5 minutes. For example, if you a timestamp equal to "2020-01-01 02:10:00" this indicates that the mean and variance values were calculated in the previous 5 minutes (2020-01-01 02:05:00 - 2020- 01-01 02:10:00). The remaining columns (apart from the last two) contain the aggregate metrics (mean and variance). The last column "Jobs" indicates the number of applications (called HPC jobs) finished on the node in the last half hour.
The penultimate "Label" column indicates the presence or absence of a failure on the node (as registered using Nagios service).
This accession contains ocean heat content change, oceanic temperature and salinity changes, and steric sea level change (change in volume without change in mass), which are important variables for monitoring the oceanic environment and by extension the Earth's energy and freshwater budgets. NCEI as the repository for oceanic subsurface profile data, especially for temperature and salinity profiles, has the data to calculate historic and recent changes in the above mentioned ocean variables. Data gathered from the NCEI archives, uniformly formatted and quality controlled in the World Ocean Database, are used to calculate temperature (and salinity) differences from a long-term climatological mean (the World Ocean Atlas series) at specific depth levels. These anomalies are used to derive integrals of ocean heat content and means for all other variables listed above.
The entire collection of GEOSAT ERM (Nov.'86 - Dec. '89) data over land and ice regions is held at the National Geophysical Data Center (NGDC). These data will yield reasonable elevation values for land and ice regions of gently varying elevation. This data collection should not be used in regions of highly variable terrain. This satellite altimeter data base contains precise geoid and gravity anomaly profiles which were constructed from the average of 66 repeat cycles of GEOSAT. The data were developed by Professor David T. Sandwell at the University of California in San Diego. The data are contained in two files: (1) geo66asc.bin (2,383,232records) contains the ascending profiles which run southeast to northwest between 72S and 72N, and (2) geo66des.bin (2,397,888 records) contains all of the descending profiles. The dataparameters in addition to time and location are geoid height, gravity anomaly, and uncertainty in gravity anomaly. GEOSAT 66 was updated in 1994 to include the 3rd and last year of data. Thus 66 repeat cycles of data are included in the AVERAGE profile calculation. This satellite altimeter data base was contributed by NOS/Geoscience Laboratory and contains data collected during the first 18 months of the original "Geodetic Mission" of the U.S. Navy Geodetic Satellite (GEOSAT). These digital data are in the form of geophysical data records (GDRs) which are described in NOAA Technical Memorandum NOS NGS-46. The data are observed over a tightly spaced (typically 2 or 3 km at 60 degrees latitude) ground track pattern, and are global in coverage. The Southern Ocean data contained in this subset of the original Geodetic Mission were declassified in 1990 and received at NGDC in mid 1991. GEOSAT GRAVITY ANOMALY GRID SOUTH OF 30 SOUTH K.M. Marks, DC McAdoo, and W.H.F. Smith The Geosciences Laboratory, ocean and Earth Sciences (NOAA), has produced a digital gravity anomaly grid computed from recently declassified Geosat Geodetic Mission data, combined with Exact Repeat Mission data, for the region between 30 S and 72 S latitudes. The grid spacing is 0.04 degrees in latitude, and 0.05 degrees in longitude. The grid file, g30_UNIX.BIN, is a binary file of two-byte signed integers, stored in raster scan line (bands of Latitude) order. There are 1051 scan lines with the first line at 30 S and the last at 72 S latitude. Each line has 7201 integers with the first element at 0 E longitude and the last element at 360 E longitude. Values equal to 32767 indicate land areas where Geosat gravity is unavailable; all other values should be multiplied by 0.01 to yield Free-Air Gravity anomalies in mGals. Data in g30_UNIX.BIN are in "normal" byte order (Sun, Mac, etc.); the equivalent file G30_DOS.DOS is in "swapped" byte order (DEC, PC, etc.). RAPP92: This data base was compiled by Dr. Richard H. Rapp, Ohio State University and was received in April, 1993. The data base consists of the following: One file containing a 0.125 degree grid of free-air gravity anomalies and their standard deviations between +/- 72 degrees latitude. The anomalies in the ocean areas have been derived from a combination of Geos-3, Seasat and Geosat altimeter data and the ETOP05U bathymetric data. Although gravity values are given for land areas they have been, primarily, computed from the OSU91A potential coefficient model that is complete to degree 360. One file containing a 0.125 degree gridded mean sea surface (in the mean tide system), in the same geographic region as the data given in the file above. One file containing 30-minute x 30-minute mean gravity anomalies and geoid undulations (in the tide free system), derived form OSU's 0.125 degree gridded point anomalies and geoid undulations. One file containing 1 degree x 1 degree mean gravity anomalies and geoid undulations (in the tide free system), as derived from the original gridded point values. Principal gravity parameters include mean gravity anomaly and mean geoid undulations. The gravity anomaly computation uses the Geodetic Reference System 1967 (GRS 67) Theoretical Formula. The data are global in coverage where data are available. SANDWELL: The high density Geosat/GM altimeter data south of 30 S have finally arrived. In addition, ERS-1 has completed more than 6 cycles of its 35-day repeat track. These data provide a dramatically improved view of the marine gravity field. The files in this directory contain global marine gravity anomalies gridded on a Mercator projection (see Sandwell and Smith, EOS Trans. AGU, v. 73, p. 133, Fall 1992 AGU meeting supplement). The grid was derived from the following data sources: Seasat - Used in areas north of 30 S latitude. Profiles within 10 km of a Geosat/ERM track were excluded. Geosat/ERM - Average of 62 Geosat Exact Repeat Mission profiles. Geosat/GM - Recently declassified Geosat Geodetic mission data south of 30 S. ERS-1 - Fast delivery IGDR's obtained from Bob Cheney at NOAA. Six, 35-day repeat cycles were used in the grid. ...
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application _domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.