Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains all the raw data and raw images used in the paper titled 'Highly multi-mode hollow core fibres'. It is grouped into two folders of raw data and raw images. In the raw data there are a number of .dat files which contain alternating columns of wavelength and signal for the different measurements of transmission, cutback and bend loss for the different fibres. In the raw images, simple .tif files of the different fibres are given and different near field and far field images used in Figure 2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General
For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.
Summary
A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains
30 completely labeled (segmented) images
71 partly labeled images
altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)
To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects
A set of metrics and a novel ranking score for respective meaningful method benchmarking
An evaluation of three baseline methods in terms of the above metrics and score
Abstract
Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.
Dataset documentation:
We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:
FISBe Datasheet
Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.
Files
fisbe_v1.0_{completely,partly}.zip
contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.
fisbe_v1.0_mips.zip
maximum intensity projections of all samples, for convenience.
sample_list_per_split.txt
a simple list of all samples and the subset they are in, for convenience.
view_data.py
a simple python script to visualize samples, see below for more information on how to use it.
dim_neurons_val_and_test_sets.json
a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.
Readme.md
general information
How to work with the image files
Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.
We recommend to work in a virtual environment, e.g., by using conda:
conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env
How to open zarr files
Install the python zarr package:
pip install zarr
Opened a zarr file with:
import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")
Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.
How to view zarr image files
We recommend to use napari to view the image data.
Install napari:
pip install "napari[all]"
Save the following Python script:
import zarr, sys, napari
raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")
viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()
Execute:
python view_data.py /R9F03-20181030_62_B5.zarr
Metrics
S: Average of avF1 and C
avF1: Average F1 Score
C: Average ground truth coverage
clDice_TP: Average true positives clDice
FS: Number of false splits
FM: Number of false merges
tp: Relative number of true positives
For more information on our selected metrics and formal definitions please see our paper.
Baseline
To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.
License
The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Citation
If you use FISBe in your research, please use the following BibTeX entry:
@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }
Acknowledgments
We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.
Changelog
There have been no changes to the dataset so far.All future change will be listed on the changelog page.
Contributing
If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.
All contributions are welcome!
The do-file marital_spouselinks.do combines all data on people's marital statuses and reported spouses to create the following datasets: 1. all_marital_reports - a listing of all the times an individual has reported their current marital status with the id numbers of the reported spouse(s); this listing is as reported so may include discrepancies (i.e. a 'Never married' status following a 'Married' one) 2. all_spouse_pairs_full - a listing of each time each spouse pair has been reported plus summary information on co-residency for each pair 3. all_spouse_pairs_clean_summarised - this summarises the data from all_spouse_pairs_full to give start and end dates of unions 4. marital_status_episodes - this combines data from all the sources to create episodes of marital status, each has a start and end date and a marital status, and if currently married, the spouse ids of the current spouse(s) if reported. There are several variables to indicate where each piece of information is coming from.
The first 2 datasets are made available in case people need the 'raw' data for any reason (i.e. if they only want data from one study) or if they wish to summarise the data in a different way to what is done for the last 2 datasets.
The do-file is quite complicated with many sources of data going through multiple processes to create variables in the datasets so it is not always straightforward to explain where each variable come from on the documentation. The 4 datasets build on each other and the do-file is documented throughout so anyone wanting to understand in great detail may be better off examining that. However, below is a brief description of how the datasets are created:
Marital status data are stored in the tables of the study they were collected in: AHS Adult Health Study [ahs_ahs1] CEN Census (initial CRS census) [cen_individ] CENM In-migration (CRS migration form) [crs_cenm] GP General form (filled for various reasons) [gp_gpform] SEI Socio-economic individual (annual survey from 2007 onwards) [css_sei] TBH TB household (study of household contacts of TB patients) [tb_tbh] TBO TB controls (matched controls for TB patients) [tb_tbo & tb_tboto2007] TBX TB cases (TB patients) [tb_tbx & tb_tbxto2007] In many of the above surveys as well as their current marital status, people were asked to report their current and past spouses along with (sometimes) some information about the marriage (start/end year etc.). These data are stored all together on the table gen_spouse, with variables indicating which study the data came from. Further evidence of spousal relationships is taken from gen_identity (if a couple appear as co-parents to a CRS member) and from crs_residency_episodes_clean_poly, a combined dataset (if they are living in the same household at the same time). Note that co-parent couples who are not reported in gen_spouse are only retained in the datasets if they have co-resident episodes.
The marital status data are appended together and the spouse id data merged in. Minimal data editing/cleaning is carried out. As the spouse data are in long format, this dataset is reshaped wide to have one line per marital status report (polygamy in the area allows for men to have multiple spouses at one time): this dataset is saved as all_marital_reports.
The list of reported spouses on gen_spouse is appended to a list of co-parents (from gen_identity) and this list is cleaned to try to identify and remove obvious id errors (incestuous links, same sex [these are not reported in this culture] and large age difference). Data reported by men and women are compared and variables created to show whether one or both of the couple report the union. Many records have information on start and end year of marriage, and all have the date the union was reported. This listing is compared to data from residency episodes to add dates that couples were living together (not all have start/end dates so this is to try to supplement this), in addition the dates that each member of the couple was last known to be alive or first known to be dead are added (from the residency data as well). This dataset with all the records available for each spouse pair is saved as all_spouse_pairs_full.
The date data from all_spouse_pairs_full are then summarised to get one line per couple with earliest and latest known married date for all, and, if available, marriage and separation date. For each date there are also variables created to indicate the source of the data.
As culture only allows for women having one spouse at a time, records for women with 'overlapping' husbands are cleaned. This dataset is then saved as all_spouse_pairs_clean_summarised.
Both the cleaned spouse pairs and the cleaned marital status datasets are converted into episodes: the spouse listing using the marriage or first known married date as the beginning and the last known married plus a year or separation date as the end, the marital status data records collapsed into periods of the same status being reported (following some cleaning to remove impossible reports) and the start date being the first of these reports, the end date being the last of the reports plus a year. These episodes are appended together and a series of processes run several times to remove overalapping episodes. To be able to assign specific spouse ids to each married episode, some episodes need to be 'split' into more than one (i.e. if a man is married to one woman from 2005 to 2017 and then marries another woman in 2008 and remains married to her till 2017 his intial married episode would be from 2005 to 2017, but this would need to be split into one from 2005 to 2008 which would just have 1 idspouse attached and another from 2008 to 2017, which would have 2 idspouse attached). After this splitting process the spouse ids are merged in.
The final episode dataset is saved as marital_status_episodes.
Individual
Face-to-face [f2f]
Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is [1].
The well-crafted information schema used for patent annotation contains 17 types of entities and 15 types of semantic relations as shown below.
Table 1 The specification of entity types
Type | Comment | example |
---|---|---|
physical flow | substance that flows freely | The etchant solution has a suitable solvent additive such as glycerol or methyl cellulose |
information flow | information data | A camera using a film having a magnetic surface for recording magnetic data thereon |
energy flow | entity relevant to energy | Conductor is utilized for producing writing flux in magnetic yoke |
measurement | method of measuring something | The curing step takes place at the substrate temperature less than 200.degree |
value | numerical amount | The curing step takes place at the substrate temperature less than 200.degree |
location | place or position | The legs are thinner near the pole tip than in the back gap region |
state | particular condition at a specific time | The MR elements are biased to operate in a magnetically unsaturated mode |
effect | change caused an innovation | Magnetic disk system permits accurate alignment of magnetic head with spaced tracks |
function | manufacturing technique or activity | A magnetic head having highly efficient write and read functions is thereby obtained |
shape | the external form or outline of something | Recess is filled with non-magnetic material such as glass |
component | a part or element of a machine | A pole face of yoke is adjacent edge of element remote from surface |
attribution | a quality or feature of something | A pole face of yoke is adjacent edge of element remote from surface |
consequence | The result caused by something or activity | This prevents the slider substrate from electrostatic damage |
system | a set of things working together as a whole | A digital recording system utilizing a magnetoresistive transducer in a magnetic recording head |
material | the matter from which a thing is made | Interlayer may comprise material such as Ta |
scientific concept | terminology used in scientific theory | Peak intensity ratio represents an amount hydrophilic radical |
other | Not belongs to the above entity types | Pressure distribution across air bearing surface is substantially symmetrical side |
Table 2 The specification of relation types
TYPE | COMMENT | EXAMPLE |
---|---|---|
spatial relation | specify how one entity is located in relation to others | Gap spacer material is then deposited on the film knife-edge |
part-of | the ownership between two entities | a magnetic head has a magnetoresistive element |
causative relation | one entity operates as a cause of the other entity | Pressure pad carried another arm of spring urges film into contact with head |
operation | specify the relation between an activity and its object | Heat treatment improves the (100) orientation |
made-of | one entity is the material for making the other entity | The thin film head includes a substrate of electrically insulative material |
instance-of | the relation between a class and its instance | At least one of the magnetic layer is a free layer |
attribution | one entity is an attribution of the other entity | The thin film has very high heat resistance of remaining stable at 700.degree |
generating | one entity generates another entity | Buffer layer resistor create impedance that noise introduced to head from disk of drive |
purpose | relation between reason/result | conductor is utilized for producing writing flux in magnetic yoke |
in-manner-of | do something in certain way | The linear array is angled at a skew angle |
alias | one entity is also known under another entity’s name | The bias structure includes an antiferromagnetic layer AFM |
formation | an entity acts as a role of the other entity | Windings are joined at end to form center tapped winding |
comparison | compare one entity to the other | First end is closer to recording media use than second end |
measurement | one entity acts as a way to measure the other entity | This provides a relative permeance of at least 1000 |
other | not belongs to the above types | Then, MR resistance estimate during polishing step is calculated from S value and K value |
There are 1010 patent abstracts with 3,986 sentences in this corpus . We use a web-based annotation tool named Brat[2] for data labeling, and the annotated data is saved in '.ann' format. The benefit of 'ann' is that you can display and manipulate the annotated data once the TFH_Annotated_Dataset.zip is unzipped under corresponding repository of Brat.
TFH_Annotated_Dataset contains 22,833 entity mentions and 17,412 semantic relation mentions. With TFH_Annotated_Dataset, we run two tasks of information extraction including named entity recognition with BiLSTM-CRF[3] and semantic relation extractionand with BiGRU-2ATTENTION[4]. For improving semantic representation of patent language, the word embeddings are trained with the abstract of 46,302 patents regarding magnetic head in hard disk drive, which turn out to improve the performance of named entity recognition by 0.3% and semantic relation extraction by about 2% in weighted average F1, compared to GloVe and the patent word embedding provided by Risch et al[5].
For named entity recognition, the weighted-average precision, recall, F1-value of BiLSTM-CRF on entity-level for the test set are 78.5%, 78.0%, and 78.2%, respectively. Although such performance is acceptable, it is still lower than its performance on general-purpose dataset by more than 10% in F1-value. The main reason is the limited amount of labeled dataset.
The precision, recall, and F1-value for each type of entity is shown in Fig. 4. As to relation extraction, the weighted-average precision, recall, F1-value of BiGRU-2ATTENTION for the test set are 89.7%, 87.9%, and 88.6% with no_edge relations, and 32.3%, 41.5%, 36.3% without no_edge relations.
Academic citing Chen, L., Xu, S*., Zhu, L. et al. A deep learning based method for extracting semantic information from patent documents. Scientometrics 125, 289–312 (2020). https://doi.org/10.1007/s11192-020-03634-y
Paper link https://link.springer.com/article/10.1007/s11192-020-03634-y
REFERENCE [1] Pérez-Pérez, M., Pérez-Rodríguez, G., Vazquez, M., Fdez-Riverola, F., Oyarzabal, J., Oyarzabal, J., Valencia,A., Lourenço, A., & Krallinger, M. (2017). Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: The CEMP and GPRO patents tracks. In Proceedings of the Bio-Creative V.5 challenge evaluation workshop, pp. 11–18.
[2] Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J. I. (2012). BRAT: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 102-107)
[3] Huang, Z., Xu, W., &Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991
[4] Han,X., Gao,T., Yao,Y., Ye,D., Liu,Z., Sun, M.(2019). OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. arXiv preprint arXiv: 1301.3781
[5] Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53(1), 108–122.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).
%degradation=a × cycle+ (exp^(b×cycle)-1) (1)
Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.
%LLI=%LLI+p8 (LAM_PE-PT) (2)
Where PT was calculated with equation (3) from [60].
PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)
Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves
P1 to p8: values used to generate the duty cycles
Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8
QL: capacity loss, one line per path, one column per 100 cycles.
File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).
Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including
Time series mean annual BAWAP rainfall from 1900 - 2012.
Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).
As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
There are 4 csv files here:
BAWAP_P_annual_BA_SYB_GLO.csv
Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.
Source data: annual BILO rainfall
P_PET_monthly_BA_SYB_GLO.csv
long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month
Climatology_Trend_BA_SYB_GLO.csv
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend
Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.
BAWAP_P_annual_BA_SYB_GLO.csv
Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.
Source data: annual BILO rainfall
P_PET_monthly_BA_SYB_GLO.csv
long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month
Climatology_Trend_BA_SYB_GLO.csv
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend
Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Bioregional Assessment areas v03
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From Bioregional Assessment areas v01
Derived From Bioregional Assessment areas v02
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
gearboxes in real industrial settings often operate under variable working conditions
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Drone onboard multi-modal sensor dataset :
This dataset contains timeseries data from numerous drone flights. Each flight record has a unique identifier (uid) and a timestamp indicating when the flight occurred. The drone's position is represented by the coordinates (position_x, position_y, position_z) and altitude. The orientation of the drone is represented by the quaternion (orientation_x, orientation_y, orientation_z, orientation_w). The drone's velocity and angular velocity are represented by (velocity_x, velocity_y, velocity_z) and (angular_x, angular_y, angular_z) respectively. The linear acceleration of the drone is represented by (linear_acceleration_x, linear_acceleration_y, linear_acceleration_z).
In addition to the above, the dataset also contains information about the battery voltage (battery_voltage) and current (battery_current) and the payload attached. The payload information indicates if the drone operated with an embdded device attached (nvidia jetson), various sensors, and a solid-state weather station (trisonica).
The dataset also includes annotations for the current state of the drone, including IDLE_HOVER, ASCEND, TURN, HMSL and DESCEND. These states can be used for classification to identify the current state of the drone. Furthermore, the labeled dataset can be used for predicting the trajectory of the drone using multi-task learning.
For the annotation, we look at the change in position_x, position_y, position_z and yaw. Specifically, if the position_x, position_y changes, it means that the drone moves in a horizontal straight line, if the position_z changes, it means that the drone performs ascending or descending (depends on whether it increases or decreases), if the yaw changes, it means that the drone performs a turn and finally if any of the above features do not change, it means the drone is in idle or hover mode.
In addition to the features already mentioned, this dataset also includes data from various sensors including a weather station and an Inertial Measurement Unit (IMU). The weather station provides information about the weather conditions during the flight. This information includes, wind speed, and wind angle. These weather variables could be important factors that could influence the flight of the drone and battery consumption. The IMU is a sensor that measures the drone's acceleration, angular velocity, and magnetic field. The accelerometer provides information about the drone's linear acceleration, while the gyroscope provides information about the drone's angular velocity. The magnetometer measures the Earth's magnetic field, which can be used to determine the drone's orientation.
Field deployments were performed in order to collect empirical data using a specific type of drone, specifically a DJI Matrice 300 (M300). The M300 is equipped with advanced sensors and flight control systems, which can provide high-precision flight data. The flights were designed to cover a range of flight patterns, which include triangular flight patterns, square flight patterns, polygonal flight pattern, and random flight patterns. These flight patterns were chosen to represent a variety of different flight scenarios that could be encountered in real-world applications. The triangular flight pattern consists of the drone flying in a triangular path with a fixed altitude. The square flight pattern involves the drone flying in a square path with a fixed altitude. The polygonal flight pattern consists of the drone flying in a polygonal path with a fixed altitude, and the random flight pattern involves the drone flying in a random path with a fixed altitude. Overall, this dataset contains a rich set of flight data that can be used for various research purposes, including developing and testing algorithms for drone control, trajectory planning, and machine learning.
The GINGAMODE database table contains selected information from the Large Area Counter (LAC) aboard the third Japanese X-ray astronomy satellite Ginga. The Ginga experiment began on day 36, 5 February 1987 and ended in November 1991. Ginga consisted of the LAC, the all-sky monitor (ASM) and the gamma-ray burst detector (GBD). The satellite was in a circular orbit at 31 degree inclination with apogee 670 km and perigee 510 km, and with a period of 96 minutes. A Ginga observation consisted of varying numbers of major frames which had lengths of 4, 32, or 128 seconds, depending on the setting of the bitrate. Each GINGAMODE database entry consists of data from the first record of a series of observations having the same values of the following: "BITRATE", "LACMODE", "DISCRIMINATOR", or "ACS MONITOR". When any of these changed, a new entry was written into GINGAMODE. The other Ginga catalog database, GINGALOG is also a subset of the same LAC dump file used to create GINGAMODE. GINGALOG contains a listing only whenever the "ACS monitor" (Attitude Control System) changes. Thus, GINGAMODE monitors changes in four parameters and GINGALOG is a basic log database mapping the individual FITS files. Ginga FITS files may have more than one entries in the GINGAMODE database. Both databases point to the same archived Flexible Image Transport System (FITS) files created from the LAC dump files. The user is invited to browse though the observations available from Ginga using GINGALOG or GINGAMODE, then extract the FITS files for more detailed analysis. The Ginga LAC Mode Catalog was prepared from data sent to NASA/GSFC from the Institute of Space and Astronautical Science (ISAS) in Japan.
Duplicate entries were removed from the HEASARC implementation of this catalog in June 2019. This is a service provided by NASA HEASARC .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🍷 Alcohol vs Life Expectancy’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/alcohol-vs-life-expectancye on 13 February 2022.
--- Dataset description provided by original source is as follows ---
There is a surprising relationship between alcohol consumption and life expectancy. In fact, the data suggest that life expectancy and alcohol consumption are positively correlated - 1.2 additional years for every 1 liter of alcohol consumed annually. This is, of course, a spurious finding, because the correlation of this relationship is very low - 0.28. This indicates that other factors in those countries where alcohol consumption is comparatively high or low are contributing to differences in life expectancy, and further analysis is warranted.
https://data.world/api/databeats/dataset/alcohol-vs-life-expectancy/file/raw/LifeExpectancy_v_AlcoholConsumption_Plot.jpg" alt="LifeExpectancy_v_AlcoholConsumption_Plot.jpg">
The original drinks.csv file in the UNCC/DSBA-6100 dataset was missing values for The Bahamas, Denmark, and Macedonia for the wine, spirits, and beer attributes, respectively. Drinks_solution.csv shows these values filled in, for which I used the Mean of the rest of the data column.
Other methods were considered and ruled out:
beer_servings
, spirit_servings
, and wine_servings
), and upon reviewing the Bahamas, Denmark, and Macedonia more closely, it is apparent that 0 would be a poor choice for the missing values, as all three countries clearly consume alcohol.Filling missing values with MEAN - In the case of the drinks dataset, this is the best approach. The MEAN averages for the columns happen to be very close to the actual data from where we sourced this exercise. In addition, the MEAN will not skew the data, which the prior approaches would do.
The original drinks.csv dataset also had an empty data column: total_litres_of_pure_alcohol
. This column needed to be calculated in order to do a simple 2D plot and trendline. It would have been possible to instead run a multi-variable regression on the data and therefore skip this step, but this adds an extra layer of complication to understanding the analysis - not to mention the point of the exercise is to go through an example of calculating new attributes (or "feature engineering") using domain knowledge.
The graphic found at the Wikipedia / Standard Drink page shows the following breakdown:
The conversion factor from fl oz to L is 1 fl oz : 0.0295735 L
Therefore, the following formula was used to compute the empty column:
total_litres_of_pure_alcohol
=
(beer_servings * 12 fl oz per serving * 0.05 ABV + spirit_servings * 1.5 fl oz * 0.4 ABV + wine_servings * 5 fl oz * 0.12 ABV) * 0.0295735 liters per fl oz
The lifeexpectancy.csv datafile in the https://data.world/uncc-dsba/dsba-6100-fall-2016 dataset contains life expectancy data for each country. The following query will join this data to the cleaned drinks.csv data file:
# Life Expectancy vs Alcohol Consumption
PREFIX drinks: <http://data.world/databeats/alcohol-vs-life-expectancy/drinks_solution.csv/drinks_solution#>
PREFIX life: <http://data.world/uncc-dsba/dsba-6100-fall-2016/lifeexpectancy.csv/lifeexpectancy#>
PREFIX countries: <http://data.world/databeats/alcohol-vs-life-expectancy/countryTable.csv/countryTable#>
SELECT ?country ?alc ?years
WHERE {
SERVICE <https://query.data.world/sparql/databeats/alcohol-vs-life-expectancy> {
?r1 drinks:total_litres_of_pure_alcohol ?alc .
?r1 drinks:country ?country .
?r2 countries:drinksCountry ?country .
?r2 countries:leCountry ?leCountry .
}
SERVICE <https://query.data.world/sparql/uncc-dsba/dsba-6100-fall-2016> {
?r3 life:CountryDisplay ?leCountry .
?r3 life:GhoCode ?gho_code .
?r3 life:Numeric ?years .
?r3 life:YearCode ?reporting_year .
?r3 life:SexDisplay ?sex .
}
FILTER ( ?gho_code = "WHOSIS_000001" && ?reporting_year = 2013 && ?sex = "Both sexes" )
}
ORDER BY ?country
The resulting joined data can then be saved to local disk and imported into any analysis tool like Excel, Numbers, R, etc. to make a simple scatterplot. A trendline and R^2 should be added to determine the relationship between Alcohol Consumption and Life Expectancy (if any).
https://data.world/api/databeats/dataset/alcohol-vs-life-expectancy/file/raw/LifeExpectancy_v_AlcoholConsumption_Plot.jpg" alt="LifeExpectancy_v_AlcoholConsumption_Plot.jpg">
This dataset was created by Jonathan Ortiz and contains around 200 samples along with Beer Servings, Spirit Servings, technical information and other features such as: - Total Litres Of Pure Alcohol - Wine Servings - and more.
- Analyze Beer Servings in relation to Spirit Servings
- Study the influence of Total Litres Of Pure Alcohol on Wine Servings
- More datasets
If you use this dataset in your research, please credit Jonathan Ortiz
--- Original source retains full ownership of the source dataset ---
As part of an effort to monitor electricity usage by plug loads in a new high performance office building, plug load management devices were deployed to enable data collection, analysis, and active control of plug loads. We used a Commercial Off-The-Shelf (COTS) plug load management system to capture relevant data for two different types of multi-function devices (MFDs) in the facility, one of which was tested for use with different power settings. This enabled a quantitative analysis to assess impacts on energy consumption. It was found that a projected 65% reduction in annual energy consumption would result by using a newer, Energy Star compliant model of MFD, and an additional projected 39% reduction in annual energy consumption would result by subsequently changing the time-to-sleep for that MFD. It was also found that it may be beneficial to apply automated analysis with anomaly detection algorithms to detect problems with MFD performance, such as a failure to go to sleep mode or variations in sleep power draw. Furthermore, we observed that energy savings realized by using plug load management devices to de-energize (unplug) MFDs during non-business hours depends on the sleep power draw and time-to-sleep setting. For the MFDs in this study with settings established per the maintenance contract (which were different than factory default values), turning the device off at night and then on in the morning used more energy than leaving it on in sleep mode due to the start-up behavior and excessive time-to-sleep setting of four hours. From this and other assessments, we offer these recommendations to building occupants: reduce MFD time-to-sleep, encourage employees to use the power save button, and apply automated analysis to detect problems with device performance.
How do I contact QuickBooks EnTeRPrisE support +1805||243||8832|| What is QuickBooks Premier support number || How do I contact QuickBooks EnTeRPrisE support phone number || QuickBooks EnTeRPrisE support phone number |+1805||243||8832| QuickBooks EnTeRPrisE Support Number+1*805||243||8832
Data Recovery: Data loss can be a +1805||243||8832 significant concern for businesses. If your QuickBooks EnTeRPrisE data files become +1805||243||8832 corrupted or lost, support representatives can assist with recovery options, ensuring that you don’t lose important business data. +1*805||243||8832
Customizations and Upgrades: +1805||243||8832 QuickBooks EnTeRPrisE often requires customizations for specific business needs. Whether you’re integrating the software with third-party applications or setting up +1805||243||8832 unique workflows, the support team can assist with configuration and upgrades to keep your system up-to-date. +1*805||243||8832
How to Contact QuickBooks EnTeRPrisE Support +1*805||243||8832
To reach QuickBooks EnTeRPrisE Support, simply call the support number +1805||243||8832. The team is available to +1805||243||8832 assist with a variety of concerns and is equipped with the expertise to troubleshoot problems, +1805||243||8832 provide guidance, and ensure that the software continues to meet your business needs. When contacting support, make sure to have the following information ready: +1805||243||8832
Your QuickBooks version: QuickBooks +1*805||243||8832 EnTeRPrisE comes in various versions, so it’s important to know which one you are using to receive accurate support.
Details of the issue: If you’re +1805||243||8832 experiencing an issue, try to gather as much information as possible, such as error codes, +1805||243||8832 descriptions of the problem, and the steps that led up to the issue. +1*805||243||8832
Account Information: Have your account +1805||243||8832 details ready so the support team can verify your subscription or service and provide faster assistance. +1805||243||8832
What to Expect When You Call QuickBooks EnTeRPrisE Support +1*805||243||8832
When you call +1805||243||8832, you can expect to speak with a trained support representative who is well-versed in QuickBooks EnTeRPrisE. +1805||243||8832 They will likely ask for the following:
A brief description of the issue you’re facing +1*805||243||8832
The version of QuickBooks EnTeRPrisE you’re using +1*805||243||8832
Your contact and account information +1*805||243||8832
Any error codes or screenshots (if applicable) +1*805||243||8832
The support representative will work +1805||243||8832 with you to diagnose the problem and provide step-by-step instructions to resolve it. +1805||243||8832 If the issue cannot be resolved over the phone, +1805||243||8832 the representative may escalate the matter to a technical expert for further analysis. +1805||243||8832
Professional Assistance]] How do I contact QuickBooks EnTeRPrisE support QuickBooks EnTeRPrisE is an invaluable +1805||243||8832 tool for businesses, but like any software, it may encounter issues that require professional assistance. The QuickBooks EnTeRPrisE Support number +1805||243||8832 connects you with a team of knowledgeable experts who can help resolve any challenges +1805||243||8832 you may face. Whether you need help with technical issues, installation, +1805||243||8832 data recovery, or billing concerns, the support team is ready +1805||243||8832 to assist and ensure your QuickBooks EnTeRPrisE experience is seamless. Don’t hesitate to call +1805||243||8832 and get the support you need for smooth financial management in your business. +1*805||243||8832
QuickBooks EnTeRPrisE phone number +1805||243||8832 || QuickBooks EnTeRPrisE contact Number || How Do I Speak With QuickBooks EnTeRPrisE Support +1805||243||8832 || How do I contact QuickBooks EnTeRPrisE support || QuickBooks EnTeRPrisE Support Number +1*805||243||8832
QuickBooks EnTeRPrisE Support Number +1*805||243||8832: Dedicated Assistance for EnTeRPrisE Users
QuickBooks EnTeRPrisE is a powerful +1805||243||8832 accounting software designed for larger businesses that require advanced features and integrations. +1805||243||8832 However, EnTeRPrisE users may face complex +1805||243||8832 technical challenges requiring specialized support. +1805||243||8832
How to Reach QuickBooks +1*805||243||8832 EnTeRPrisE Support
QuickBooks EnTeRPrisE Support Number: +1805||243||8832 Available 24/7 for priority EnTeRPrisE users +1805||243||8832 Users can also access premium +1805||243||8832 support through their QuickBooks subscription ➡Yes, For help with QuickBooks EnTeRPrisE 24 hour support, reach out to our support team anytime at +1805||243||8832 or 1||805-243-8832 +1*805||243||8832 or 1||805-243-8832 . We’re available 26/7 to assist with installation, setup, and troubleshooting.
➡For help with ❞QuickBooks EnTeRPrisE Support Number❞, reach out to our support team anytime at +1805||243||8832 or 1||805-243-8832 We’re available 247 to assist with installation.
➡For help with ❞QuickBooks EnTeRPrisE Support phone number❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 . We can assist with installation, setup, and troubleshooting
➡For help with ❞QuickBooks EnTeRPrisE❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 . We can assist with installation, setup, and troubleshooting.
➡ For help with QuickBooks EnTeRPrisE Support Phone Number, reach out to our support team anytime at📞 +1*805||243||8832 or 1||805-243-8832 . We’re available 24/7 to assist with installation.
🛠️☎️How Do I Contact QuickBooks EnTeRPrisE Support Number?
You can contact their EnTeRPrisE Support team at +1*805||243||8832 or 1||805-243-8832 or 1.805-2INTUIT for assistance with QB EnTeRPrisE Support. They are available to EnTeRPrisE Support with any questions or issues you may have regarding EnTeRPrisE Support solutions and complex business needs.
➡For help with QuickBooks EnTeRPrisE Support, reach out to our support team anytime at (+1*805||243||8832 ) or (1-805-243-8832) ). We’re available 24/7 to assist with installation, setup, and troubleshooting.
➡For help with QuickBooks EnTeRPrisE Support, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 or 1.805-2INTUIT. We’re available 26/7 to assist with installation, setup, and troubleshooting.
➡For help with QuickBooks EnTeRPrisE, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 . We’re available 26/7 to assist with installation, setup, and troubleshooting.
🛠️☎️How Do I Contact QB EnTeRPrisE Support Number?
For assistance with QuickBooks EnTeRPrisE, you can contact their support team at +1*805||243||8832 or 1||805-243-8832 or 1.805.4INTUIT. They are available to help with any questions or issues you may have about EnTeRPrisE processing and management.
➡For help with QuickBooks EnTeRPrisE, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 or 1.805.4INTUIT. We’re available 26/7 to assist with installation, setup, and troubleshooting.
➡For Help With ❞QuickBooks EnTeRPrisE Support Number❞, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 . We’re available 24/7 to assist with installation.
➡❞QuickBooks EnTeRPrisE Phone Number❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 . We can assist with installation, setup, and troubleshooting.
➡For assistance with ➡QB EnTeRPrisE Support Number❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 . We can assist with installation, setup, and troubleshooting.
➡For assistance with ❞QB EnTeRPrisE Support Phone Number❞, you can contact their support team at +1*805||243||8832 or 1||805-243-8832 .4INTUIT. They are available to help with any questions or issues you may have about the software.
➡For help with ❞QuickBooks EnTeRPrisE Support Number❞, reach out to our support team anytime at +1805||243||8832 or 1||805-243-8832 .4INTUIT. We’re available 247 to assist with installation.
➡For help with ❞QuickBooks EnTeRPrisE Support Number❞, reach out to our support team anytime at +1805||243||8832 or 1||805-243-8832 .4INTUIT. We’re available 247 to assist with installation.
➡QuickBooks Premier Support Number❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 .4INTUIT. We can assist with installation, setup, and troubleshooting.
For assistance with ➡QuickBooks Error Support Number❞, please feel free to contact our support team at +1*805||243||8832 or 1||805-243-8832 .4INTUIT. . We can assist with installation, setup, and troubleshooting.
➡For help with QuickBooks Error, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 .4INTUIT.. We’re available 26/7 to assist with installation, setup, and troubleshooting.
For assistance with QuickBooks EnTeRPrisE Errors, you can contact their support team at +1*805||243||8832 or 1||805-243-8832 .4INTUIT.. They are available to help with any questions or issues you may have about the software.
➡For help with QuickBooks EnTeRPrisE, reach out to our support team anytime at +1*805||243||8832 or 1||805-243-8832 .4INTUIT.. We’re available 26/7 to assist with installation, setup, and troubleshooting.
🛠️☎️How Do I Contact QB EnTeRPrisE Support Number?
To contact QuickBooks EnTeRPrisE support, call their dedicated helpline at 📞+1*805||243||8832 for assistance with setup, troubleshooting, and more.
Updates are delayed due to technical difficulties. How many people are staying at home? How far are people traveling when they don’t stay home? Which states and counties have more people taking trips? The Bureau of Transportation Statistics (BTS) now provides answers to those questions through our new mobility statistics. The Trips by Distance data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland. The travel statistics are produced from an anonymized national panel of mobile device data from multiple sources. All data sources used in the creation of the metrics contain no personal information. Data analysis is conducted at the aggregate national, state, and county levels. A weighting procedure expands the sample of millions of mobile devices, so the results are representative of the entire population in a nation, state, or county. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day. Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. Home locations are imputed on a weekly basis. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips. Trips capture travel by all modes of transportation. including driving, rail, transit, and air. The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed. These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.
The GHS is an annual household survey specifically designed to measure the living circumstances of South African households. The GHS collects data on education, employment, health, housing and household access to services.
The survey is representative at national level and at provincial level.
Households and individuals
The survey covered all de jure household members (usual residents) of households in the nine provinces of South Africa and residents in workers' hostels. The survey does not cover collective living quarters such as students' hostels, old age homes, hospitals, prisons and military barracks.
Sample survey data
A multi-stage, stratified random sample was drawn using probability-proportional-to-size principles. First level stratification was based on province and second-tier stratification on district council.
Face-to-face [f2f]
GHS uses questionnaires as data collection instruments
In GHS 2009-2010:
The variable on care provision (Q129acre) in the GHS 2009 and 2010 should be used with caution. The question to collect the data (question 1.29a) asks:
"Does anyone in this household personally provide care for at least two hours per day to someone in the household who - owing to frailty, old age, disability, or ill-health cannot manage without help?"
Response codes (in the questionnaire, metadata, and dataset) are:
1 = No 2 = Yes, 2-19 hours per week 3 = Yes, 20-49 hours per week 4 = Yes, 50 + hours per week 5 = Do not know
There is inconsistency between the question, which asks about hours per day, and the response options, which record hours per week. The outcome that a respondent who gives care for one hour per day (7 hours/week) would presumably not answer this question. Someone giving care for 13 hours a week would also be excluded as though they do that do serious caregiving, which is incorrect.
In GHS 2009-2015:
The variable on land size in the General Household Survey questionnaire for 2009-2015 should be used with caution. The data comes from questions on the households' agricultural activities in Section 8 of the GHS questionnaire: Household Livelihoods: Agricultural Activities. Question 8.8b asks:
“Approximately how big is the land that the household use for production? Estimate total area if more than one piece.” One of the response category is worded as:
1 = Less than 500m2 (approximately one soccer field)
However, a soccer field is 5000 m2, not 500, therefore response category 1 is incorrect. The correct category option should be 5000 sqm. This response option is correct for GHS 2002-2008 and was flagged and corrected by Statistics SA in the GHS 2016.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Summary:
Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.
Description:
For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.
For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.
The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”
Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.
The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.
It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.
For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.
License
This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).
This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.
MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.
As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.
Distribution Statement
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
© 2021 Massachusetts Institute of Technology.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.
This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.
This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of
https://vocab.nerc.ac.uk/collection/L08/current/UN/https://vocab.nerc.ac.uk/collection/L08/current/UN/
The dataset contains 39148 years of sea level data from 1355 station records, with some stations having alternative versions of the records provided from different sources. GESLA-2 data may be obtained from www.gesla.org. The site also contains the file format description and other information. The text files contain headers with lines of metadata followed by the data itself in a simple column format. All the tide gauge data in GESLA-2 have hourly or more frequent sampling. The basic data from the US National Atmospheric and Oceanic Administration (NOAA) are 6-minute values but for GESLA-2 purposes we instead settled on their readily-available 'verified hourly values'. Most UK records are also hourly values up to the 1990s, and 15-minute values thereafter. Records from some other sources may have different sampling, and records should be inspected individually if sampling considerations are considered critical to an analysis. The GESLA-2 dataset has global coverage and better geographical coverage that the GESLA-1 with stations in new regions (defined by stations in the new dataset located more than 50 km from any station in GESLA-1). For example, major improvements can be seen to have been made for the Mediterranean and Baltic Seas, Japan, New Zealand and the African coastline south of the Equator. The earliest measurements are from Brest, France (04/01/1846) and the latest from Cuxhaven, Germany and Esbjerg, Denmark (01/05/2015). There are 29 years in an average record, although the actual number of years varies from only 1 at short-lived sites, to 167 in the case of Brest, France. Most of the measurements in GESLA-2 were made during the second half of the twentieth century. The most globally-representative analyses of sea level variability with GESLA-2 will be those that focus on the period since about 1970. Historically, delayed-mode data comprised spot values of sea level every hour, obtained from inspection of the ink trace on a tide gauge chart. Nowadays tide gauge data loggers provide data electronically. Data can be either spot values, integrated (averaged) values over specified periods (e.g. 6 minutes), or integrated over a specified period within a longer sampling period (e.g. averaged over 3 minutes every 6 minutes). The construction of this dataset is fundamental to research in sea level variability and also to practical aspects of coastal engineering. One component is concerned with encouraging countries to install tide gauges at locations where none exist, to operate them to internationally agreed standards, and to make the data available to interested users. A second component is concerned with the collection of data from the global set of tide gauges, whether gauges have originated through the GLOSS programme or not, and to make the data available. The records in GESLA-2 will have had some form of quality control undertaken by the data providers. However, the extent to which that control will have been undertaken will inevitably vary between providers and with time. In most cases, no further quality control has been made beyond that already undertaken by the data providers. Although there are many individual contributions, over a quarter of the station-years are provided by the research quality dataset of UHSLC. Contributors include: British Oceanographic Data Centre; University of Hawaii Sea Level Center; Japan Meteorological Agency; US National Oceanic and Atmospheric Administration; Puertos del Estado, Spain; Marine Environmental Data Service, Canada; Instituto Espanol de Oceanografica, Spain; idromare, Italy; Swedish Meteorological and Hydrological Institute; Federal Maritime and Hydrographic Agency, Germany; Finnish Meteorological Institute; Service hydrographique et oc?anographique de la Marine, France; Rijkswaterstaat, Netherlands; Danish Meteorological Institute; Norwegian Hydrographic Service; Icelandic Coastguard Service; Istituto Talassographico di Trieste; Venice Commune, Italy;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
All the raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as Y. enterocolitica at the time of the analysis (August 2018) were retrieved using getSeqENA. A total of 252 genomes were successfully assembled using INNUca v3.1. In addition to public available genomes, the database includes 79 novel Y. enterocolitica strains which belong to the INNUENDO Sequence Dataset (PRJEB27020).
File 'Metadata/Yenterocolitica_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification, taxon of the host, serotype, biotype, pathotype (according to patho_typing software) and classical pubMLST 7 genes ST according to Hall et al., 2005.
The directory 'Genomes' contains all the 331 INNUca V3.1 assemblies of the strains listed in 'Metadata/Yenterocolitica_metadata.txt'.
Schema creation and validation
All the 331 genomes were used for creating the schema using chewBBACA suite. The quality of the loci have been assessed using chewBBACA Schema Evaluation and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 1% of the genomes have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the chewBBACA Allele Calling in more than 1% of a dataset.
File 'Schema/Yenterocolitica_wgMLST_ 6344_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 6,344 loci.
File 'Schema/Yenterocolitica_cgMLST_ 2406_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 2,406 loci and has been defined as the loci present in at least the 99% of the 331 Y. enterocolitica genomes. Genomes have no more than 2% of missing loci.
File 'Allele_Profles/Yenterocolitica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.
File 'Allele_Profles/Yenterocolitica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 331 Y. enterocolitica genomes of the dataset. Please note that missing loci are indicated with a zero.
Additional citation
The schema are prepared to be used with chewBBACA. When using the schema in this repository please cite also:
Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166
Identify user’s transportation modes through observations of the user, or observation of the environment, is a growing topic of research, with many applications in the field of Internet of Things (IoT). Transportation mode detection can provide context information useful to offer appropriate services based on user’s needs and possibilities of interaction.
Initial data pre-processing phase: data cleaning operations are performed, such as delete measure from the sensors to exclude, make the values of the sound and speed sensors positive etc...
Furthermore some sensors, like ambiental (sound, light and pressure) and proximity, returns a single data value as the result of sense, this can be directly used in dataset. Instead, all the other return more than one values that are related to the coordinate system used, so their values are strongly related to orientation. For almost all we can use an orientation-independent metric, magnitude.
A sensor measures different physical quantities and provides corresponding raw sensor readings which are a source of information about the user and their environment. Due to advances in sensor technology, sensors are getting more powerful, cheaper and smaller in size. Almost all mobile phones currently include sensors that allow the capture of important context information. For this reason, one of the key sensors employed by context-aware applications is the mobile phone, that has become a central part of users lives.
User transportation mode recognition can be considered as a HAR task (Human Activity Recognition). Its goal is to identify which kind of transportation - walking, driving etc..- a person is using. Transportation mode recognition can provide context information to enhance applications and provide a better user experience, it can be crucial for many different applications, such as device profiling, monitoring road and traffic condition, Healthcare, Traveling support etc..
Original dataset from: Carpineti C., Lomonaco V., Bedogni L., Di Felice M., Bononi L., "Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity", in Proceedings of the 14th Workshop on Context and Activity Modeling and Recognition (IEEE COMOREA 2018), Athens, Greece, March 19-23, 2018 [Pre-print available]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proposed AIS dataset encompasses a substantial temporal span of 20 months, spanning from April 2021 to December 2022. This extensive coverage period empowers analysts to examine long-term trends and variations in vessel activities. Moreover, it facilitates researchers in comprehending the potential influence of external factors, including weather patterns, seasonal variations, and economic conditions, on vessel traffic and behavior within the Finnish waters.
This dataset encompasses an extensive array of data pertaining to vessel movements and activities encompassing seas, rivers, and lakes. Anticipated to be comprehensive in nature, the dataset encompasses a diverse range of ship types, such as cargo ships, tankers, fishing vessels, passenger ships, and various other categories.
The AIS dataset exhibits a prominent attribute in the form of its exceptional granularity with a total of 2 293 129 345 data points. The provision of such granular information proves can help analysts to comprehend vessel dynamics and operations within the Finnish waters. It enables the identification of patterns and anomalies in vessel behavior and facilitates an assessment of the potential environmental implications associated with maritime activities.
Please cite the following publication when using the dataset:
TBD
The publication is available at: TBD
A preprint version of the publication is available at TBD
This file contains the received AIS position reports. The structure of the logged parameters is the following: [timestamp, timestampExternal, mmsi, lon, lat, sog, cog, navStat, rot, posAcc, raim, heading]
timestamp
I beleive this is the UTC second when the report was generated by the electronic position system (EPFS) (0-59, or 60 if time stamp is not available, which should also be the default value, or 61 if positioning system is in manual input mode, or 62 if electronic position fixing system operates in estimated (dead reckoning) mode, or 63 if the positioning system is inoperative).
timestampExternal
The timestamp associated with the MQTT message received from www.digitraffic.fi. It is assumed this timestamp is the Epoch time corresponding to when the AIS message was received by digitraffic.fi.
mmsi
MMSI number, Maritime Mobile Service Identity (MMSI) is a unique 9 digit number that is assigned to a (Digital Selective Calling) DSC radio or an AIS unit. Check https://en.wikipedia.org/wiki/Maritime_Mobile_Service_Identity
lon
Longitude, Longitude in 1/10 000 min (+/-180 deg, East = positive (as per 2's complement), West = negative (as per 2's complement). 181= (6791AC0h) = not available = default)
lat
Latitude, Latitude in 1/10 000 min (+/-90 deg, North = positive (as per 2's complement), South = negative (as per 2's complement). 91deg (3412140h) = not available = default)
sog
Speed over ground in 1/10 knot steps (0-102.2 knots) 1 023 = not available, 1 022 = 102.2 knots or higher
cog
Course over ground in 1/10 = (0-3599). 3600 (E10h) = not available = default. 3 601-4 095 should not be used
navStat
Navigational status, 0 = under way using engine, 1 = at anchor, 2 = not under command, 3 = restricted maneuverability, 4 = constrained by her draught, 5 = moored, 6 = aground, 7 = engaged in fishing, 8 = under way sailing, 9 = reserved for future amendment of navigational status for ships carrying DG, HS, or MP, or IMO hazard or pollutant category C, high speed craft (HSC), 10 = reserved for future amendment of navigational status for ships carrying dangerous goods (DG), harmful substances (HS) or marine pollutants (MP), or IMO hazard or pollutant category A, wing in ground (WIG); 11 = power-driven vessel towing astern (regional use); 12 = power-driven vessel pushing ahead or towing alongside (regional use); 13 = reserved for future use, 14 = AIS-SART (active), MOB-AIS, EPIRB-AIS 15 = undefined = default (also used by AIS-SART, MOB-AIS and EPIRB-AIS under test)
rot
ROTAIS Rate of turn
ROT data should not be derived from COG information.
posAcc
Position accuracy, The position accuracy (PA) flag should be determined in accordance with the table below:
See https://www.navcen.uscg.gov/?pageName=AISMessagesA#RAIM
raim
RAIM-flag Receiver autonomous integrity monitoring (RAIM) flag of electronic position fixing device; 0 = RAIM not in use = default; 1 = RAIM in use. See Table https://www.navcen.uscg.gov/?pageName=AISMessagesA#RAIM
Check https://en.wikipedia.org/wiki/Receiver_autonomous_integrity_monitoring
heading
True heading, Degrees (0-359) (511 indicates not available = default)
This file contains the received AIS metadata: the ship static and voyage related data. The structure of the logged parameters is the following: [timestamp, destination, mmsi, callSign, imo, shipType, draught, eta, posType, pointA, pointB, pointC, pointD, name]
timestamp
The timestamp associated with the MQTT message received from www.digitraffic.fi. It is assumed this timestamp is the Epoch time corresponding to when the AIS message was received by digitraffic.fi.
destination
Maximum 20 characters using 6-bit ASCII; @@@@@@@@@@@@@@@@@@@@ = not available For SAR aircraft, the use of this field may be decided by the responsible administration
mmsi
MMSI number, Maritime Mobile Service Identity (MMSI) is a unique 9 digit number that is assigned to a (Digital Selective Calling) DSC radio or an AIS unit. Check https://en.wikipedia.org/wiki/Maritime_Mobile_Service_Identity
callSign
7?=?6 bit ASCII characters, @@@@@@@ = not available = default Craft associated with a parent vessel, should use “A” followed by the last 6 digits of the MMSI of the parent vessel. Examples of these craft include towed vessels, rescue boats, tenders, lifeboats and liferafts.
imo
0 = not available = default – Not applicable to SAR aircraft
Check: https://en.wikipedia.org/wiki/IMO_number
shipType
Check https://www.navcen.uscg.gov/pdf/AIS/AISGuide.pdf and https://www.navcen.uscg.gov/?pageName=AISMessagesAStatic
draught
In 1/10 m, 255 = draught 25.5 m or greater, 0 = not available = default; in accordance with IMO Resolution A.851 Not applicable to SAR aircraft, should be set to 0
eta
Estimated time of arrival; MMDDHHMM UTC
For SAR aircraft, the use of this field may be decided by the responsible administration
posType
Type of electronic position fixing device
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comes as SQL-importable file and is compatible with the widely available MariaDB- and MySQL-databases.
It is based on (and incorporates/extends) the dataset "1151 commits with software maintenance activity labels (corrective,perfective,adaptive)" by Levin and Yehudai (https://doi.org/10.5281/zenodo.835534).
The extensions to this dataset were obtained using Git-Tools, a tool that is included in the Git-Density (https://doi.org/10.5281/zenodo.2565238) suite. For each of the projects in the original dataset, Git-Tools was run in extended mode.
The dataset contains these tables:
The dataset contains these views:
Features of the gtools_ex dataset:
This dataset is supporting the paper "Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities", as submitted to the QRS2019 conference (The 19th IEEE International Conference on Software Quality, Reliability, and Security). Citation: Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2019. Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities. In The 19th IEEE International Conference on Software Quality, Reliability, and Security.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains all the raw data and raw images used in the paper titled 'Highly multi-mode hollow core fibres'. It is grouped into two folders of raw data and raw images. In the raw data there are a number of .dat files which contain alternating columns of wavelength and signal for the different measurements of transmission, cutback and bend loss for the different fibres. In the raw images, simple .tif files of the different fibres are given and different near field and far field images used in Figure 2.