Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For calculating the GLOWOPT representative route network, a forecast model chain was used. The model was calibrated with 2019 flight movement data (unimpeded by COVID-19) and provided forecasted aircraft movements from the year 2019 (~2020) to 2050 in 5 years intervals.
Two formats of datasets are generated with the results of the forecast model chain, a csv file format and 4-dimensional array supported with MATLAB (.mat).
CSV Datasets
For each forecasted year a csv file is generated with the information on the origin-destination (OD) airports IATA codes, region, latitude and longitude of OD pair, representative aircraft type along with the aircraft category , the average load factor and finally, the distance between the OD pair. The airports worldwide are sub-dived into nine regions namely Africa, Asia, Caribbean, Central America, Europe, Middle East, North America, Oceania and South America. There are total of seven datasets, one for each forecasted year i.e. for years 2019 (~2020), 2025, 2030, 2035, 2040, 2045 and 2050.
Description of the data labels:
Origin- Origin airport IATA code
Origin_Region- Region of the Origin Airport
Origin_Latitude- Latitude of the Origin Airport
Origin_Longitude- Longitude of the Origin Airport
Destination- Destination airport IATA code
Destination_Region- Region of the Destination Airport
Destination_Latitude- Latitude of the Destination Airport
Destination_Longitude- Longitude of the Destination Airport
AcType- Representative aircraft type
Load_Factor- Average load factor per flight
Yearly_Frequency- Total aircraft movements per annum
RefACType- Aircraft Category based on number of seats (Category 6 represents aircraft with seats 252-301 and category 7 represents aircraft with seats greater than 302.)
Distance- Great circle distance between Origin and Destination in Km.
MATLAB Datasets
The dataset generated with MATLAB is a 4-dimensional array with the extension *.mat. The first dimension is the region of the origin airport and subsequently the second dimensions contains the region of the destination airport. The third and fourth dimension are the aircraft category based on seat numbers and the categorized great circle distances. The information received therein is a 1X1 cell with the IATA codes of the OD pairs, frequency and great circle distance in Km.
The 4D array is categorised such that the user can select the route segment specific to a region or a combination of regions. The range categorisation in combination with an aircraft category additionally offers the user the possibility to select routes depending on their great circle distances. The ranges are categorised to represent very short range (0-2000 km), short range (2000-6000 km), medium range (6000-10000 km) and long range (10000 – 15000 km).
Indexing based on the categorisation of the 4D array dataset - Refer to file 'Indexing_MAT_Dataset.PNG'
For example:
To derive the OD pairs and yearly frequency of aircraft movements for routes which originate from Europe and are destined to Asia, operated with category 6 aircraft type and are separated by distances between 10,000 to 15,000 km:
In MATLAB (Indexing based on file 'Indexing_MAT_Dataset.PNG' ):
Route_Network (5,2,1,4),
Description on Index:
5 – Europe: Origin Region
2 – Asia: Destination Region
1– Category 6: Aircraft Type
4 – 10000-15000 km: Range
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Note: Updates to this data product are discontinued. Dozens of definitions are currently used by Federal and State agencies, researchers, and policymakers. The ERS Rural Definitions data product allows users to make comparisons among nine representative rural definitions.
Methods of designating the urban periphery range from the use of municipal boundaries to definitions based on counties. Definitions based on municipal boundaries may classify as rural much of what would typically be considered suburban. Definitions that delineate the urban periphery based on counties may include extensive segments of a county that many would consider rural.
We have selected a representative set of nine alternative rural definitions and compare social and economic indicators from the 2000 decennial census across the nine definitions. We chose socioeconomic indicators (population, education, poverty, etc.) that are commonly used to highlight differences between urban and rural areas.This record was taken from the USDA Enterprise Data Inventory that feeds into the https://data.gov catalog. Data for this record includes the following resources: Webpage with links to Excel files State-Level Maps For complete information, please visit https://data.gov.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Webster Dictionary: Sentiment Dataset is a vocabulary dataset for emotional analysis that provides emotional scores and emotional classifications (positive, negative, and neutral) for each of the 100,000 words in the Webster Dictionary, calculated by the sum of weights from three models: AFINN, VADER, and TextBlob.
2) Data Utilization (1) Webster Dictionary: Sentiment Dataset has characteristics that: • Each row contains words, positive/negative/negative, and weight-based emotional scores, with approximately 2,100 out of 106,351 words categorized as positive, 2,600 as negative, and the rest as neutral. • Emotional scores have increased reliability by combining the results of three representative emotional analysis models, and subtle emotional differences for each word can be expressed numerically. (2) Webster Dictionary: Sentiment Dataset can be used to: • Text Emotional Analysis and Vocabulary Expansion: It can be applied to various text data such as news, reviews, and SNS to increase the accuracy of emotional analysis and use it to build an emotional vocabulary dictionary for each domain. • Natural language processing and recommendation system: It can be applied to sentence and document unit emotional prediction, user review-based recommendation system, and emotional response design of interactive AI by utilizing word-specific emotional scores.
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/ZDNOGFhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/ZDNOGF
The “Training and development dataset for information extraction in plant epidemiomonitoring” is the annotation set of the “Corpus for the epidemiomonitoring of plant”. The annotations include seven entity types (e.g. species, locations, disease), their normalisation by the NCBI taxonomy and GeoNames and binary (seven) and ternary relationships. The annotations refer to character positions within the documents of the corpus. The annotation guidelines give their definitions and representative examples. Both datasets are intended for the training and validation of information extraction methods.
After defining a taxonomy of the main stone deterioration patterns and anomalies, we selected 354 highly representative images of stone-built heritage, offering them a careful selection of labels to choose from.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for semantic and instance segmentation experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.
Abstract:
Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.
Benchmark data
Two pose-estimation datasets were procured. Both datasets used first instar Sungaya nexpectata (Zompro 1996) stick insects as a model species. Recordings from an evenly lit platform served as representative for controlled laboratory conditions; recordings from a hand-held phone camera served as approximate example for serendipitous recordings in the field.
For the platform experiments, walking S. inexpectata were recorded using a calibrated array of five FLIR blackfly colour cameras (Blackfly S USB3, Teledyne FLIR LLC, Wilsonville, Oregon, U.S.), each equipped with 8 mm c-mount lenses (M0828-MPW3 8MM 6MP F2.8-16 C-MOUNT, CBC Co., Ltd., Tokyo, Japan). All videos were recorded with 55 fps, and at the sensors’ native resolution of 2048 px by 1536 px. The cameras were synchronised for simultaneous capture from five perspectives (top, front right and left, back right and left), allowing for time-resolved, 3D reconstruction of animal pose.
The handheld footage was recorded in landscape orientation with a Huawei P20 (Huawei Technologies Co., Ltd., Shenzhen, China) in stabilised video mode: S. inexpectata were recorded walking across cluttered environments (hands, lab benches, PhD desks etc), resulting in frequent partial occlusions, magnification changes, and uneven lighting, so creating a more varied pose-estimation dataset.
Representative frames were extracted from videos using DeepLabCut (DLC)-internal k-means clustering. 46 key points in 805 and 200 frames for the platform and handheld case, respectively, were subsequently hand-annotated using the DLC annotation GUI.
Synthetic data
We generated a synthetic dataset of 10,000 images at a resolution of 1500 by 1500 px, based on a 3D model of a first instar S. inexpectata specimen, generated with the scAnt photogrammetry workflow. Generating 10,000 samples took about three hours on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super). We applied 70\% scale variation, and enforced hue, brightness, contrast, and saturation shifts, to generate 10 separate sub-datasets containing 1000 samples each, which were combined to form the full dataset.
Funding
This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
An in-depth description of the various Political Boundaries GIS data layers outlining terms of use, update frequency, attribute explanations, and more. District data layers include: Lake County Boundary, County Board, Judicial Circuit Court Subcircuits, Political Townships, State Representative Districts, State Senate, Congressional Districts, and Voting Precincts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"
DOI: https://doi.org/10.11583/DTU.c.6287841
Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.
The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course
These datasets consists of labelled trajectories for the purpose of evaluating unsupervised models for detection of abnormal maritime behavior. For unlabelled datasets for training please refer to the collection. Link in Related publications.
The dataset is an example of a SAR event and cannot not be considered representative of a large population of all SAR events.
The dataset consists of a total of 521 trajectories of which 25 is labelled as abnormal. the data is captured on a single day in a specific region. The remaining normal traffic is representative of the traffic during the winter season. The normal traffic in the ROI has a fairly high seasonality related to fishing and leisure sailing traffic.
The data is saved using the pickle format for Python. Each dataset is split into 2 files with naming convention:
datasetInfo_XXX
data_XXX
Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.
The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.
Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl
See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Panel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modeling the complexities of human behavior, the notion of universal panel data – in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated – can further enhance exploitation of the richness of panel information.
This Basic Information Document (BID) provides a brief overview of the Tanzania National Panel Survey (NPS), but focuses primarily on the theoretical development and application of panel data, as well as key elements of the universal panel survey instrument and datasets generated by the four rounds of the NPS. As this Basic Information Document (BID) for the UPD does not describe in detail the background, development, or use of the NPS itself, the round-specific NPS BIDs should supplement the information provided here.
The NPS Uniform Panel Dataset (UPD) consists of both survey instruments and datasets, meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the NPS. The NPS-UPD provides a consistent and straightforward means of conducting not only user-driven analyses using convenient, standardized tools, but also for monitoring MKUKUTA, FYDP II, and other national level development indicators reported by the NPS.
The design of the NPS-UPD combines the four completed rounds of the NPS – NPS 2008/09 (R1), NPS 2010/11 (R2), NPS 2012/13 (R3), and NPS 2014/15 (R4) – into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.
Designed for analysis of key indicators at four primary domains of inference, namely: Dar es Salaam, other urban, rural, Zanzibar.
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Sample survey data [ssd]
While the same sample of respondents was maintained over the first three rounds of the NPS, longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time; i.e. attrition. Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts was done for the NPS 2014/15 to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. A newly completed Population and Housing Census (PHC) in 2012, providing updated population figures along with changes in administrative boundaries, emboldened the opportunity to realign the NPS sample and abate collective bias potentially introduced through attrition.
To maintain the panel concept of the NPS, the sample design for NPS 2014/2015 consisted of a combination of the original NPS sample and a new NPS sample. A nationally representative sub-sample was selected to continue as part of the “Extended Panel” while an entirely new sample, “Refresh Panel”, was selected to represent national and sub-national domains. Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.
Face-to-face [f2f]
The format of the NPS-UPD survey instrument is similar to previously disseminated NPS survey instruments. Each module has a questionnaire and clearly identifies if the module collects information at the individual or household level. Within each module-specific questionnaire of the NPS-UPD survey instrument, there are five distinct sections, arranged vertically: (1) the UPD - “U” on the survey instrument, (2) R4, (3), R3, (4) R2, and (5) R1 – the latter 4 sections presenting each questionnaire in its original form at time of its respective dissemination.
The uppermost section of each module’s questionnaire (“U”) represents the model universal panel questionnaire, with questions generated from the comprehensive listing of questions across all four rounds of the NPS and codes generated from the comprehensive collection of codes. The following sections are arranged vertically by round, considering R4 as most recent. While not all rounds will have data reported for each question in the UPD and not each question will have reports for each of the UPD codes listed, the NPS-UPD survey instrument represents the visual, all-inclusive set of information collected by the NPS over time.
The four round-specific sections (R4, R3, R2, R1) are aligned with their UPD-equivalent question, visually presenting their contribution to compatibility with the UPD. Each round-specific section includes the original round-specific variable names, response codes and skip patterns (corresponding to their respective round-specific NPS data sets, and despite their variance from other rounds or from the comprehensive UPD code listing)4.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Data generation in machine learning involves creating or manipulating data to train and evaluate machine learning models. The purpose of data generation is to provide diverse and representative examples that cover a wide range of scenarios, ensuring the model's robustness and generalization. Data augmentation techniques involve applying various transformations to existing data samples to create new ones. These transformations include: random rotations, translations, scaling, flips, and more. Augmentation helps in increasing the dataset size, introducing natural variations, and improving model performance by making it more invariant to specific transformations. The dataset contains GENERATED USA passports, which are replicas of official passports but with randomly generated details, such as name, date of birth etc. The primary intention of generating these fake passports is to demonstrate the structure and content of a typical passport document and to train the neural network to identify this type of document. Generated passports can assist in conducting research without accessing or compromising real user data that is often sensitive and subject to privacy regulations. Synthetic data generation allows researchers to develop and refine models using simulated passport data without risking privacy leaks.
The study included four separate surveys:
The survey of Family Income Support (MOP in Serbian) recipients in 2002 These two datasets are published together separately from the 2003 datasets.
The LSMS survey of general population of Serbia in 2003 (panel survey)
The survey of Roma from Roma settlements in 2003 These two datasets are published together.
Objectives
LSMS represents multi-topical study of household living standard and is based on international experience in designing and conducting this type of research. The basic survey was carried out in 2002 on a representative sample of households in Serbia (without Kosovo and Metohija). Its goal was to establish a poverty profile according to the comprehensive data on welfare of households and to identify vulnerable groups. Also its aim was to assess the targeting of safety net programs by collecting detailed information from individuals on participation in specific government social programs. This study was used as the basic document in developing Poverty Reduction Strategy (PRS) in Serbia which was adopted by the Government of the Republic of Serbia in October 2003.
The survey was repeated in 2003 on a panel sample (the households which participated in 2002 survey were re-interviewed).
Analysis of the take-up and profile of the population in 2003 was the first step towards formulating the system of monitoring in the Poverty Reduction Strategy (PRS). The survey was conducted in accordance with the same methodological principles used in 2002 survey, with necessary changes referring only to the content of certain modules and the reduction in sample size. The aim of the repeated survey was to obtain panel data to enable monitoring of the change in the living standard within a period of one year, thus indicating whether there had been a decrease or increase in poverty in Serbia in the course of 2003. [Note: Panel data are the data obtained on the sample of households which participated in the both surveys. These data made possible tracking of living standard of the same persons in the period of one year.]
Along with these two comprehensive surveys, conducted on national and regional representative samples which were to give a picture of the general population, there were also two surveys with particular emphasis on vulnerable groups. In 2002, it was the survey of living standard of Family Income Support recipients with an aim to validate this state supported program of social welfare. In 2003 the survey of Roma from Roma settlements was conducted. Since all present experiences indicated that this was one of the most vulnerable groups on the territory of Serbia and Montenegro, but with no ample research of poverty of Roma population made, the aim of the survey was to compare poverty of this group with poverty of basic population and to establish which categories of Roma population were at the greatest risk of poverty in 2003. However, it is necessary to stress that the LSMS of the Roma population comprised potentially most imperilled Roma, while the Roma integrated in the main population were not included in this study.
The surveys were conducted on the whole territory of Serbia (without Kosovo and Metohija).
Sample survey data [ssd]
Sample frame for both surveys of general population (LSMS) in 2002 and 2003 consisted of all permanent residents of Serbia, without the population of Kosovo and Metohija, according to definition of permanently resident population contained in UN Recommendations for Population Censuses, which were applied in 2002 Census of Population in the Republic of Serbia. Therefore, permanent residents were all persons living in the territory Serbia longer than one year, with the exception of diplomatic and consular staff.
The sample frame for the survey of Family Income Support recipients included all current recipients of this program on the territory of Serbia based on the official list of recipients given by Ministry of Social affairs.
The definition of the Roma population from Roma settlements was faced with obstacles since precise data on the total number of Roma population in Serbia are not available. According to the last population Census from 2002 there were 108,000 Roma citizens, but the data from the Census are thought to significantly underestimate the total number of the Roma population. However, since no other more precise data were available, this number was taken as the basis for estimate on Roma population from Roma settlements. According to the 2002 Census, settlements with at least 7% of the total population who declared itself as belonging to Roma nationality were selected. A total of 83% or 90,000 self-declared Roma lived in the settlements that were defined in this way and this number was taken as the sample frame for Roma from Roma settlements.
Planned sample: In 2002 the planned size of the sample of general population included 6.500 households. The sample was both nationally and regionally representative (representative on each individual stratum). In 2003 the planned panel sample size was 3.000 households. In order to preserve the representative quality of the sample, we kept every other census block unit of the large sample realized in 2002. This way we kept the identical allocation by strata. In selected census block unit, the same households were interviewed as in the basic survey in 2002. The planned sample of Family Income Support recipients in 2002 and Roma from Roma settlements in 2003 was 500 households for each group.
Sample type: In both national surveys the implemented sample was a two-stage stratified sample. Units of the first stage were enumeration districts, and units of the second stage were the households. In the basic 2002 survey, enumeration districts were selected with probability proportional to number of households, so that the enumeration districts with bigger number of households have a higher probability of selection. In the repeated survey in 2003, first-stage units (census block units) were selected from the basic sample obtained in 2002 by including only even numbered census block units. In practice this meant that every second census block unit from the previous survey was included in the sample. In each selected enumeration district the same households interviewed in the previous round were included and interviewed. On finishing the survey in 2003 the cases were merged both on the level of households and members.
Stratification: Municipalities are stratified into the following six territorial strata: Vojvodina, Belgrade, Western Serbia, Central Serbia (Šumadija and Pomoravlje), Eastern Serbia and South-east Serbia. Primary units of selection are further stratified into enumeration districts which belong to urban type of settlements and enumeration districts which belong to rural type of settlement.
The sample of Family Income Support recipients represented the cases chosen randomly from the official list of recipients provided by Ministry of Social Affairs. The sample of Roma from Roma settlements was, as in the national survey, a two-staged stratified sample, but the units in the first stage were settlements where Roma population was represented in the percentage over 7%, and the units of the second stage were Roma households. Settlements are stratified in three territorial strata: Vojvodina, Beograd and Central Serbia.
Face-to-face [f2f]
In all surveys the same questionnaire with minimal changes was used. It included different modules, topically separate areas which had an aim of perceiving the living standard of households from different angles. Topic areas were the following: 1. Roster with demography. 2. Housing conditions and durables module with information on the age of durables owned by a household with a special block focused on collecting information on energy billing, payments, and usage. 3. Diary of food expenditures (weekly), including home production, gifts and transfers in kind. 4. Questionnaire of main expenditure-based recall periods sufficient to enable construction of annual consumption at the household level, including home production, gifts and transfers in kind. 5. Agricultural production for all households which cultivate 10+ acres of land or who breed cattle. 6. Participation and social transfers module with detailed breakdown by programs 7. Labour Market module in line with a simplified version of the Labour Force Survey (LFS), with special additional questions to capture various informal sector activities, and providing information on earnings 8. Health with a focus on utilization of services and expenditures (including informal payments) 9. Education module, which incorporated pre-school, compulsory primary education, secondary education and university education. 10. Special income block, focusing on sources of income not covered in other parts (with a focus on remittances).
During field work, interviewers kept a precise diary of interviews, recording both successful and unsuccessful visits. Particular attention was paid to reasons why some households were not interviewed. Separate marks were given for households which were not interviewed due to refusal and for cases when a given household could not be found on the territory of the chosen census block.
In 2002 a total of 7,491 households were contacted. Of this number a total of 6,386 households in 621 census rounds were interviewed. Interviewers did not manage to collect the data for 1,106 or 14.8% of selected households. Out of this number 634 households
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewWater companies in the UK are responsible for testing the quality of drinking water. This dataset contains the results of samples taken from the taps in domestic households to make sure they meet the standards set out by UK and European legislation. This data shows the location, date, and measured levels of determinands set out by the Drinking Water Inspectorate (DWI).Key Definitions AggregationProcess involving summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes Anonymisation Anonymised data is a type of information sanitisation in which data anonymisation tools encrypt or remove personally identifiable information from datasets for the purpose of preserving a data subject's privacy Dataset Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields. Determinand A constituent or property of drinking water which can be determined or estimated. DWI Drinking Water Inspectorate, an organisation “providing independent reassurance that water supplies in England and Wales are safe and drinking water quality is acceptable to consumers.” DWI Determinands Constituents or properties that are tested for when evaluating a sample for its quality as per the guidance of the DWI. For this dataset, only determinands with “point of compliance” as “customer taps” are included. Granularity Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours ID Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance. LSOA Lower-Level Super Output Area is made up of small geographic areas used for statistical and administrative purposes by the Office for National Statistics. It is designed to have homogeneous populations in terms of population size, making them suitable for statistical analysis and reporting. Each LSOA is built from groups of contiguous Output Areas with an average of about 1,500 residents or 650 households allowing for granular data collection useful for analysis, planning and policy- making while ensuring privacy. ONS Office for National Statistics Open Data Triage The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. Sample A sample is a representative segment or portion of water taken from a larger whole for the purpose of analysing or testing to ensure compliance with safety and quality standards. Schema Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute. Units Standard measurements used to quantify and compare different physical quantities. Water Quality The chemical, physical, biological, and radiological characteristics of water, typically in relation to its suitability for a specific purpose, such as drinking, swimming, or ecological health. It is determined by assessing a variety of parameters, including but not limited to pH, turbidity, microbial content, dissolved oxygen, presence of substances and temperature.Data HistoryData Origin These samples were taken from customer taps. They were then analysed for water quality, and the results were uploaded to a database. This dataset is an extract from this database.Data Triage Considerations Granularity Is it useful to share results as averages or individual? We decided to share as individual results as the lowest level of granularity Anonymisation It is a requirement that this data cannot be used to identify a singular person or household. We discussed many options for aggregating the data to a specific geography to ensure this requirement is met. The following geographical aggregations were discussed: • Water Supply Zone (WSZ) - Limits interoperability with other datasets • Postcode – Some postcodes contain very few households and may not offer necessary anonymisation • Postal Sector – Deemed not granular enough in highly populated areas • Rounded Co-ordinates – Not a recognised standard and may cause overlapping areas • MSOA – Deemed not granular enough • LSOA – Agreed as a recognised standard appropriate for England and Wales • Data Zones – Agreed as a recognised standard appropriate for Scotland Data Triage Review Frequency Annually unless otherwise requested Publish FrequencyAnnuallyData Specifications • Each dataset will cover a year of samples in calendar year • This dataset will be published annually • Historical datasets will be published as far back as 2016 from the introduction of The Water Supply (Water Quality) Regulations 2016 • The determinands included in the dataset are as per the list that is required to be reported to the Drinking Water Inspectorate. • A small proportion of samples could not be allocated to an LSOA – these represented less than 0.1% of samples and were removed from the dataset in 2023. • The postcode to LSOA lookup table used for 2022 was not available when 2023 data was processed, see supplementary information for the lookup table applied to each calendar year of data. Context Many UK water companies provide a search tool on their websites where you can search for water quality in your area by postcode. The results of the search may identify the water supply zone that supplies the postcode searched. Water supply zones are not linked to LSOAs which means the results may differ to this dataset. Some sample results are influenced by internal plumbing and may not be representative of drinking water quality in the wider area. Some samples are tested on site and others are sent to scientific laboratories.Supplementary informationBelow is a curated selection of links for additional reading, which provide a deeper understanding of this dataset. 1. Drinking Water Inspectorate Standards and Regulations: https://www.dwi.gov.uk/drinking-water-standards-and-regulations/ 2. LSOA (England and Wales) and Data Zone (Scotland): https://www.nrscotland.gov.uk/files/geography/2011-census/geography-bckground-info-comparison-of-thresholds.pdf 3. Description for LSOA boundaries by the ONS: https://www.ons.gov.uk/methodology/geography/ukgeographies/censusgeographies/census2021geographies4. Postcode to LSOA lookup tables (2022 calendar year data): https://geoportal.statistics.gov.uk/datasets/postcode-to-2021-census-output-area-to-lower-layer-super-output-area-to-middle-layer-super-output-area-to-local-authority-district-august-2023-lookup-in-the-uk/about 5. Postcode to LSOA lookup tables (2023 calendar year data): https://geoportal.statistics.gov.uk/datasets/b8451168e985446eb8269328615dec62/about6. Legislation history: https://www.dwi.gov.uk/water-companies/legislation/
The dataset contains 39148 years of sea level data from 1355 station records, with some stations having alternative versions of the records provided from different sources. GESLA-2 data may be obtained from www.gesla.org. The site also contains the file format description and other information. The text files contain headers with lines of metadata followed by the data itself in a simple column format. All the tide gauge data in GESLA-2 have hourly or more frequent sampling. The basic data from the US National Atmospheric and Oceanic Administration (NOAA) are 6-minute values but for GESLA-2 purposes we instead settled on their readily-available 'verified hourly values'. Most UK records are also hourly values up to the 1990s, and 15-minute values thereafter. Records from some other sources may have different sampling, and records should be inspected individually if sampling considerations are considered critical to an analysis. The GESLA-2 dataset has global coverage and better geographical coverage that the GESLA-1 with stations in new regions (defined by stations in the new dataset located more than 50 km from any station in GESLA-1). For example, major improvements can be seen to have been made for the Mediterranean and Baltic Seas, Japan, New Zealand and the African coastline south of the Equator. The earliest measurements are from Brest, France (04/01/1846) and the latest from Cuxhaven, Germany and Esbjerg, Denmark (01/05/2015). There are 29 years in an average record, although the actual number of years varies from only 1 at short-lived sites, to 167 in the case of Brest, France. Most of the measurements in GESLA-2 were made during the second half of the twentieth century. The most globally-representative analyses of sea level variability with GESLA-2 will be those that focus on the period since about 1970. Historically, delayed-mode data comprised spot values of sea level every hour, obtained from inspection of the ink trace on a tide gauge chart. Nowadays tide gauge data loggers provide data electronically. Data can be either spot values, integrated (averaged) values over specified periods (e.g. 6 minutes), or integrated over a specified period within a longer sampling period (e.g. averaged over 3 minutes every 6 minutes). The construction of this dataset is fundamental to research in sea level variability and also to practical aspects of coastal engineering. One component is concerned with encouraging countries to install tide gauges at locations where none exist, to operate them to internationally agreed standards, and to make the data available to interested users. A second component is concerned with the collection of data from the global set of tide gauges, whether gauges have originated through the GLOSS programme or not, and to make the data available. The records in GESLA-2 will have had some form of quality control undertaken by the data providers. However, the extent to which that control will have been undertaken will inevitably vary between providers and with time. In most cases, no further quality control has been made beyond that already undertaken by the data providers. Although there are many individual contributions, over a quarter of the station-years are provided by the research quality dataset of UHSLC. Contributors include: British Oceanographic Data Centre; University of Hawaii Sea Level Center; Japan Meteorological Agency; US National Oceanic and Atmospheric Administration; Puertos del Estado, Spain; Marine Environmental Data Service, Canada; Instituto Espanol de Oceanografica, Spain; idromare, Italy; Swedish Meteorological and Hydrological Institute; Federal Maritime and Hydrographic Agency, Germany; Finnish Meteorological Institute; Service hydrographique et oceanographique de la Marine, France; Rijkswaterstaat, Netherlands; Danish Meteorological Institute; Norwegian Hydrographic Service; Icelandic Coastguard Service; Istituto Talassographico di Trieste; Venice Commune, Italy;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"This database has been collected by ESMAP as validation exercise for the multi-tier energy access questionnaire in Malawi. The multi-tier definition and measurement of energy access is an innovative approach that goes beyond the traditional binary measures of access. This approach defines energy access as the ability to access energy that is adequate, available when needed, reliable, good quality, affordable, legal, convenient, healthy, and safe across household, productive, and community uses. The Malawi database captures all the variables needed to calculate the attribute to measure energy access (availability, reliability, affordability, quality, duration…) under the new definition. Data are not representative of the country. For further information on the Multi-Tier Framework for Measuring Energy Access visit https://www.esmap.org/node/55526"
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The International Bank for Reconstruction and Development (IBRD) loans are public and publicly guaranteed debt extended by the World Bank Group. IBRD loans are made to, or guaranteed by, countries that are members of IBRD. IBRD may also make loans to IFC. IBRD lends at market rates. Data are in U.S. dollars calculated using historical rates. This dataset contains historical snapshots of the Statement of Loans including the latest available snapshots. The World Bank complies with all sanctions applicable to World Bank transactions.
Description of columns
End of Period | End of Period Date represents the date as of which balances are shown in the report. |
---|---|
Loan Number | For IBRD loans and IDA credits or grants a loan number consists of the organization prefix (IBRD/IDA) and a five-character label that uniquely identifies the loan within the organization. In IDA, all grant labels start with the letter ‘H’. |
Region | Country lending is grouped into regions based on the current World Bank administrative (rather than geographic) region where project implementation takes place. The Other Region is used for loans to the IFC. |
Country Code | Country Code according to the World Bank country list. Might be different from the ISO country code. |
Country | Country to which loan has been issued. Loans to the IFC are included under the country “World”. |
Borrower | The representative of the borrower to which the Bank loan is made. |
Guarantor Country Code | Country Code of the Guarantor according to the World Bank country list. Might be different from the ISO country code. |
Guarantor | The Guarantor guarantees repayment to the Bank if the borrower does not repay. |
Loan Type | A type of loan/loan instrument for which distinctive accounting and/or other actions need to be performed. See Data Dictionary attached in the About section or Data Dictionary dataset available from the list of all datasets for details. |
Loan Status | Status of the loan. See Data Dictionary attached in the About section or Data Dictionary dataset available from the list of all datasets for status descriptions. |
The rest of the description can be found in this link
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets were created as part of a study involving an experiment with a helpdesk team at an international software company. The goal was to implement an automated performance appraisal model that evaluates the team based on issue reports and key features derived from classifying message exchanged with the customers using Dialog Acts. The data was extracted from a PostgreSQL database and curated to present aggregated views of helpdesk tickets reported between January 2016 and March 2023. Certain fields have been anonymized (masked) to protect the data owner’s privacy while preserving the overall meaning of the information. The datasets are: - issues.csv Dataset holds information for all reported tickets, showing its category, priority, who reported the issue, related project, who was assigned to resolve that ticket, the start time, the resolution time, and how many seconds the ticket spent in each resolution step. - issues_change_history.csv Shows when the ticket assignee and status were changed. This dataset helps calculate the time spent on each step. - issues_snapshots.csv Contains the same records in the issues.csv but duplicates the tickets that multiple assignees handled; each record is the processing cycle per assignee. - scored_issues_snapshot_sample.xlsx A stratified and representative sample extracted from the tickets and then handed to an annotator (the help-desk manager) to appraise the resolution performance against three targets, where 5 is the highest score and 1 is the lowest. - sample_utterances.csv Contains the messages (comments) that were exchanged between the reporters and the helpdesk team. This dataset only contains the curated messages for the issues listed in scored_issues_snapshot_sample.xlsx, as those were the focus of the initial study.
The following files are guidelines on how to work and interpret the datasets: - FEATURES.md Describes the datasets features (fields). - EXAMPLE.md Shows an example of an issue in all datasets so the reader can understand the relations between them. - process-flow.png A demonstration of the steps followed by the helpdesk team to resolve an issue.
These datasets are valuable for many other experiments such like: - Count Predictions - Regression - Association rules mining - Natural Language Processing - Classification - Clustering
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Zoning composed of all urban transport perimeters (PTU) The information on the attributes of the layer is up to date as of 01/01/2017, with the exception of cumulative kilometres of lines. Geographical layer carried out by GIS workshop on 25/07/2017 An urban transport area concerns the territory of a municipality or the territorial jurisdiction of a public establishment which has been entrusted with the task of organising public passenger transport. At the request of the mayor or the president of the public establishment, the representative of the State shall record the establishment of the perimeter, after consulting the General Council in the event that the departmental plan is concerned. [.] In the overseas departments, the representative of the State, on a proposal from the mayor or the president of the public establishment, may define a perimeter excluding certain parts of the territory of the municipality. The urban transport perimeter may also include several adjacent municipalities which have decided to organise a public passenger transport service in public. The establishment and delimitation of this perimeter shall be determined by the representative of the State at the request of the mayors of the municipalities concerned after the opinion of the General Council. Only PTUs in development or validated form part of this class of objects. Save in particular cases defined by law, the decree establishing an urban or urban community or the order transforming a public institution of intercommunal cooperation into a community of conurbation or an urban community shall constitute establishment of an urban transport perimeter in accordance with the law on urban solidarity and renewal. The boundaries over which the SRU Joint Unions, established pursuant to the SRU Law, exercise their powers are not urban transport boundaries. The information on the attributes of the layer is up to date as of 01/01/2017, with the exception of cumulative kilometres of lines. These data correspond to a definition of a macroscopic level PTU: the content of the UTP that includes regulations and cartographic documents is not included in these data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For calculating the GLOWOPT representative route network, a forecast model chain was used. The model was calibrated with 2019 flight movement data (unimpeded by COVID-19) and provided forecasted aircraft movements from the year 2019 (~2020) to 2050 in 5 years intervals.
Two formats of datasets are generated with the results of the forecast model chain, a csv file format and 4-dimensional array supported with MATLAB (.mat).
CSV Datasets
For each forecasted year a csv file is generated with the information on the origin-destination (OD) airports IATA codes, region, latitude and longitude of OD pair, representative aircraft type along with the aircraft category , the average load factor and finally, the distance between the OD pair. The airports worldwide are sub-dived into nine regions namely Africa, Asia, Caribbean, Central America, Europe, Middle East, North America, Oceania and South America. There are total of seven datasets, one for each forecasted year i.e. for years 2019 (~2020), 2025, 2030, 2035, 2040, 2045 and 2050.
Description of the data labels:
Origin- Origin airport IATA code
Origin_Region- Region of the Origin Airport
Origin_Latitude- Latitude of the Origin Airport
Origin_Longitude- Longitude of the Origin Airport
Destination- Destination airport IATA code
Destination_Region- Region of the Destination Airport
Destination_Latitude- Latitude of the Destination Airport
Destination_Longitude- Longitude of the Destination Airport
AcType- Representative aircraft type
Load_Factor- Average load factor per flight
Yearly_Frequency- Total aircraft movements per annum
RefACType- Aircraft Category based on number of seats (Category 6 represents aircraft with seats 252-301 and category 7 represents aircraft with seats greater than 302.)
Distance- Great circle distance between Origin and Destination in Km.
MATLAB Datasets
The dataset generated with MATLAB is a 4-dimensional array with the extension *.mat. The first dimension is the region of the origin airport and subsequently the second dimensions contains the region of the destination airport. The third and fourth dimension are the aircraft category based on seat numbers and the categorized great circle distances. The information received therein is a 1X1 cell with the IATA codes of the OD pairs, frequency and great circle distance in Km.
The 4D array is categorised such that the user can select the route segment specific to a region or a combination of regions. The range categorisation in combination with an aircraft category additionally offers the user the possibility to select routes depending on their great circle distances. The ranges are categorised to represent very short range (0-2000 km), short range (2000-6000 km), medium range (6000-10000 km) and long range (10000 – 15000 km).
Indexing based on the categorisation of the 4D array dataset - Refer to file 'Indexing_MAT_Dataset.PNG'
For example:
To derive the OD pairs and yearly frequency of aircraft movements for routes which originate from Europe and are destined to Asia, operated with category 6 aircraft type and are separated by distances between 10,000 to 15,000 km:
In MATLAB (Indexing based on file 'Indexing_MAT_Dataset.PNG' ):
Route_Network (5,2,1,4),
Description on Index:
5 – Europe: Origin Region
2 – Asia: Destination Region
1– Category 6: Aircraft Type
4 – 10000-15000 km: Range