The Highway-Runoff Database (HRDB) was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration (FHWA) to provide planning-level information for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway runoff on the Nationâs receiving waters. The HRDB was assembled by using a Microsoft Access database application to facilitate use of the data and to calculate runoff-quality statistics with methods that properly handle censored-concentration data. This data release provides highway-runoff data, including information about monitoring sites, precipitation, runoff, and event-mean concentrations of water-quality constituents. The dataset was compiled from 37 studies as documented in 113 scientific or technical reports. The dataset includes data from 242 highway sites across the country. It includes data from 6,837 storm events with dates ranging from April 1975 to November 2017. Therefore, these data span more than 40 years; vehicle emissions and background sources of highway-runoff constituents have changed markedly during this time. For example, some of the early data is affected by use of leaded gasoline, phosphorus-based detergents, and industrial atmospheric deposition. The dataset includes 106,441 concentration values with data for 414 different water-quality constituents. This dataset was assembled from various sources and the original data was collected and analyzed by using various protocols. Where possible the USGS worked with State departments of transportation and the original researchers to obtain, document, and verify the data that was included in the HRDB. This new version (1.1.0) of the database contains software updates to provide data-quality information within the Graphical User Interface (GUI), calculate statistics for multiple sites in batch mode, and output additional statistics. However, inclusion in this dataset does not constitute endorsement by the USGS or the FHWA. People who use this data are responsible for ensuring that the data are complete and correct and that it is suitable for their intended purposes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Considering the mixed results of imputation, the wide variety of available methods, and the varied structure of real trait datasets, a framework for selecting a suitable imputation method is advantageous. We invoked a real data-driven simulation strategy to select an imputation method for a given mixed-type (categorical, count, continuous) target dataset. Candidate methods included mean/mode imputation, k-nearest neighbour, random forests, and multivariate imputation by chained equations (MICE). Using a trait dataset of squamates (lizards and amphisbaenians; order: Squamata) as a target dataset, a complete-case dataset consisting of species with nearly completed information was formed for the imputation method selection. Missing data were induced by removing values from this dataset under different missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). For each method, combinations with and without phylogenetic information from single gene (nuclear and mitochondrial) or multigene trees were used to impute the missing values for five numerical and two categorical traits. The performances of the methods were evaluated under each missing mechanism by determining the mean squared error and proportion falsely classified rates for numerical and categorical traits, respectively. A random forest method supplemented with a nuclear-derived phylogeny resulted in the lowest error rates for the majority of traits, and this method was used to impute missing values in the original dataset. Data with imputed values better reflected the characteristics and distributions of the original data compared to complete-case data. However, caution should be taken when imputing trait data as phylogeny did not always improve performance for every trait and in every scenario. Ultimately, these results support the use of a real data-driven simulation strategy for selecting a suitable imputation method for a given mixed-type trait dataset.
Objective The objective of this study was to generate a new and consistent set of values for changes in travel time, reliability and comfort for the various modes of transport in The Netherlands. These values serve as key input for Social Cost Benefit Analyses (SCBA) for mobility projects.
Scope Compared with the previous executed Dutch VTT study date back from 2013, national values of travel time have been laid down for walking and cycling in this study for the first time. The value of travel time for transport to and from airports (in The Netherlands) for travel by airplane have also been determined for the first time. Also new are the values of travel time of comfort. The comfort multipliers indicate the level of comfort in relation to passenger volume for train and BTM, and the comfort of the walking and cycling infrastructure.
Methodology The national averaged value of travel time (VTT) and value of travel time reliability (VTTR) for passenger transport are derived from Stated Preference (SP) choice experiments. A questionnaire is distributed among a target audience, containing questions regarding trip characteristics, personal characteristics and several choice experiments. In the experiment, respondents are asked to choose between hypothetical alternatives, each describing a trip in terms of travel time, travel cost and possibly other characteristics. Respondents are asked each time which alternative is preferred. From these choices the value of travel time and travel time reliability can be inferred. The VTT and VTTR values are subsequently estimated by employing discrete choice models: models aiming to explain the observed choice made by the respondent in the experiment. Coefficients and interaction factors are added to these models to improve the model fit: boost the model's ability to explain the data. Once values have been obtained from these models, these number are applied to the sample of respondents to calculate their respective VTT/VTTR. Finally, these values are weighted to match statistics from the recent Dutch national travel survey. As a result, a VTT and VTTR is derived for each mode - purpose (business, commute and other) combination.
Data collection About 9,700 respondents participated in the passenger travel survey designed for the national VTT study. Roughly 80% of respondents were recruited from an internet panel and about 20% were recruited by an interviewer intercepting them during their travel. The survey contained 10 unique SP experiments, dedicated to various modes of transport (car, train, bus tram metro (BTM), airplane, recreational navigation, walking and cycling) and valuations (value of travel time, reliability, comfort and more). Each respondent participated in two of them, based on their current travel activities. The choice experiments were dynamically constructed by a pivotal experimental design, so that each respondent received realistic choice tasks relatable to their current travel pattern.
The main data collection phase occurred in June and September 2022. Response rates for both the internet panel and intercept recruitment were high compared to previous studies, especially compared to the previous Dutch VTT survey from 2009/2011. The use of a high-quality internet panel and the increased rewards for participation in this new study are the likely cause for this improvement. A filtering procedure based on a list of predetermined conditions was applied to the respondent data. Conditions for exclusion were entries for which it was suspected that something had gone wrong during the survey, either because of a problem in the survey, a misunderstanding, or a mistake made by the respondent. About 80% of the observations remained for further analysis after this procedure. This procedure has resulted in a final dataset of more than 7.500 high-quality reliable responses.
Significance was responsible for data gathering and analysis on behalf of the Netherlands Institute for Transport Policy Analysis (KiM).
Dataset Note that the uploaded datasets contains all (anonymised) data, which includes data from the various pilot studies and other data which are filtered out based on the filtering procedure to obtain the final dataset as above-mentioned.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
A new approach for the estimation of soil organic carbon (SOC) pools north of the tree line has been developed based on synthetic aperture radar (SAR; ENVISAT Advanced SAR Global Monitoring mode) data. SOC values are directly determined from backscatter values instead of upscaling using land cover or soil classes. The multi-mode capability of SAR allows application across scales. It can be shown that measurements in C band under frozen conditions represent vegetation and surface structure properties which relate to soil properties, specifically SOC. It is estimated that at least 29 Pg C is stored in the upper 30 cm of soils north of the tree line. This is approximately 25 % less than stocks derived from the soil-map-based Northern Circumpolar Soil Carbon Database (NCSCD). The total stored carbon is underestimated since the established empirical relationship is not valid for peatlands or strongly cryoturbated soils. The approach does, however, provide the first spatially consistent account of soil organic carbon across the Arctic. Furthermore, it could be shown that values obtained from 1 km resolution SAR correspond to accounts based on a high spatial resolution (2 m) land cover map over a study area of about 7 Ă 7 km in NE Siberia. The approach can be also potentially transferred to medium-resolution C-band SAR data such as ENVISAT ASAR Wide Swath with ~120 m resolution but it is in general limited to regions without woody vegetation. Global Monitoring-mode-derived SOC increases with unfrozen period length. This indicates the importance of this parameter for modelling of the spatial distribution of soil organic carbon storage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To classify the topological states of lasing modes, one needs to solve rate equations and then can analyze the time-dependent data using different machine learning approaches: 1) fixed library, 2) top-down adaptive library, and 3) bottom-up datative library. In this paper, we consider a coupled resonator arrays, so called SSH lattice, with 21 sites. To build libraries, we used 2000 samples implying we considered 2000 different set of gain and linear loss coefficients that forms a 2D parameter space presented in the paper. In this paper, all the coefficients are normalised so that no units are required when solving the equations and presenting the results. First, the data file "dataset_params.txt" contains the parameters used for the time evolution of the system, and the initial conditions, for each samples generated. It contains 24 columns where the first three columns are the sample index, gain (g_A) and linar loss coefficients (g_AB) for A sites and A and B sites, respectively. Note that we have 2000 samples that is equal to the number of points in graphs in the paper and the sample index was not sorted in ascending/desceding order because the order is not important. The remaining 21 columns are the intial values of site amplitude, x(t=0), that were used when solving the rate equations Eq. (2) and Eq. (3) using the 4th-order Runge-Kutta method. Here, we used 0.01 commonly for all the sites (N=21) and all the different samples (2000).Second, the data file "dataset_time_series.txt" contains time evolution of the system for each samples generated. The first column is the sample index, the remaining 42 columns are the complex amplitudes of the mode corresponding to Re(a(x=1,t=t0)), Im(a(x=1,t0)), Re(a(x=2,t=t0)), Im(a(x=2,t=t0)), ...., Re(a(x=1,t=t1)), ... .Third, there are 8 data files that correspond to phase diagrams calculated using the methods describe in the manuscript. All of them have the same structure and contains derived phase diagram, i.e., the first and second columns are the linear loss coefficeints (g_AB) and the gain coefficient (g_A) and the third column is the index for different phases. filenames: ahdmd?tab_class_ij.txt ahdmd?tab_class_ij_bis2.txt xc_tab_class_red_i_5.txt xc_tab_class_red_i_1.txt xc_tab_class_red_i_-2.txt xc_tab_class_red_ij_0_3.txt xc_tab_class_red_ij_0_7.txt xc_tab_class_red_ij_3_7.txtThe datafiles are used in figures as described below=====================Files used in figures=====================Figure 1:dataset_params.txtdataset_time_series.txt Figure 2:panel a: ahdmd?tab_class_ij.txtpanel b: ahdmd?tab_class_ij_bis2.txtFigure 3:xc_tab_class_red_i_5.txtFigure 4:panel a: method is applied for each hyper-parameter value on times series in dataset_time_series, then the number of classes is countedpanel b: xc_tab_class_red_i_1.txtpanel c: xc_tab_class_red_i_-2.txtFigure 5:xc_tab_class_red_ij_0_3.txtFigure 6:panel a: method is applied for each hyper-parameter value on times series in dataset_time_series, then the number of classes is countedpanel b: xc_tab_class_red_ij_0_7.txtpanel c: xc_tab_class_red_ij_3_7.txtFigure S1-S3:use sample in dataset_time_series.txt corresponding to parameters gamma_AB, g_a of panel d-e of figure 1Figure S4:Apply described method on each decomposition methods on times series in dataset_time_series.txtFigure S5:Apply described method on times series in dataset_time_series.txtFigure S6:Apply described method on times series in dataset_time_series.txtResearch results based upon these data are published at https://doi.org/10.1038/s42005-023-01230-z
Syngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.
National coverage
Agricultural holdings
Sample survey data [ssd]
A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.
B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).
C. Selection procedure The respondents were picked randomly using a âquota based random samplingâ procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.
BF Screened from Japan were selected based on the following criterion:
Location: Hokkaido Tokachi (JA Memuro, JA Otofuke, JA Tokachi Shimizu, JA Obihiro Taisho) --> initially focus on Memuro, Otofuke, Tokachi Shimizu, Obihiro Taisho // Added locations in GGP 2015 due to change of RF: Obhiro, Kamikawa, Abashiri
BF: no use of in furrow application (Amigo) - no use of Amistar
Contract farmers of snacks and other food companies --> screening question: 'Do you have quality contracts in place with snack and food companies for your potato production? Y/N --> if no, screen out
Increase of marketable yield --> screening question: 'Are you interested in growing branded potatoes (premium potatoes for processing industry)? Y/N --> if no, screen out
Potato growers for process use
Background info: No mention of Syngenta
Background info:
- Labor cost is very serious issue: In general, labor cost in Japan is very high. Growers try to reduce labor cost by mechanization. Percentage of labor cost in production cost. They would like to manage cost of labor
- Quality and yield driven
Face-to-face [f2f]
Data collection tool for 2019 covered the following information:
(A) PRE- HARVEST INFORMATION
PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment
(B) HARVEST INFORMATION
PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation
See all questionnaires in external materials tab
Data processing:
Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.
Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.
âą Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.
âą Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.
âą Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.
âą Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.
âą Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.
âą Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.
âą It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.
Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:
For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.
Censuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation.The census plays an essential role in public administration. The results are used to ensure: âą equity in distribution of government services âą distributing and allocating government funds among various regions and districts for education and health services âą delineating electoral districts at national and local levels, and âą measuring the impact of industrial development, to name a few The census also provides the benchmark for all surveys conducted by the national statistical office. Without the sampling frame derived from the census, the national statistical system would face difficulties in providing reliable official statistics for use by government and the public. Census also provides information on small areas and population groups with minimum sampling errors. This is important, for example, in planning the location of a school or clinic. Census information is also invaluable for use in the private sector for activities such as business planning and market analyses. The information is used as a benchmark in research and analysis.
Census 2011 was the third democratic census to be conducted in South Africa. Census 2011 specific objectives included: - To provide statistics on population, demographic, social, economic and housing characteristics; - To provide a base for the selection of a new sampling frame; - To provide data at lowest geographical level; and - To provide a primary base for the mid-year projections.
National
Households, Individuals
Census/enumeration data [cen]
Face-to-face [f2f]
About the Questionnaire : Much emphasis has been placed on the need for a population census to help government direct its development programmes, but less has been written about how the census questionnaire is compiled. The main focus of a population and housing census is to take stock and produce a total count of the population without omission or duplication. Another major focus is to be able to provide accurate demographic and socio-economic characteristics pertaining to each individual enumerated. Apart from individuals, the focus is on collecting accurate data on housing characteristics and services.A population and housing census provides data needed to facilitate informed decision-making as far as policy formulation and implementation are concerned, as well as to monitor and evaluate their programmes at the smallest area level possible. It is therefore important that Statistics South Africa collects statistical data that comply with the United Nations recommendations and other relevant stakeholder needs.
The United Nations underscores the following factors in determining the selection of topics to be investigated in population censuses: a) The needs of a broad range of data users in the country; b) Achievement of the maximum degree of international comparability, both within regions and on a worldwide basis; c) The probable willingness and ability of the public to give adequate information on the topics; and d) The total national resources available for conducting a census.
In addition, the UN stipulates that census-takers should avoid collecting information that is no longer required simply because it was traditionally collected in the past, but rather focus on key demographic, social and socio-economic variables.It becomes necessary, therefore, in consultation with a broad range of users of census data, to review periodically the topics traditionally investigated and to re-evaluate the need for the series to which they contribute, particularly in the light of new data needs and alternative data sources that may have become available for investigating topics formerly covered in the population census. It was against this background that Statistics South Africa conducted user consultations in 2008 after the release of some of the Community Survey products. However, some groundwork in relation to core questions recommended by all countries in Africa has been done. In line with users' meetings, the crucial demands of the Millennium Development Goals (MDGs) should also be met. It is also imperative that Stats SA meet the demands of the users that require small area data.
Accuracy of data depends on a well-designed questionnaire that is short and to the point. The interview to complete the questionnaire should not take longer than 18 minutes per household. Accuracy also depends on the diligence of the enumerator and honesty of the respondent.On the other hand, disadvantaged populations, owing to their small numbers, are best covered in the census and not in household sample surveys.Variables such as employment/unemployment, religion, income, and language are more accurately covered in household surveys than in censuses.Users'/stakeholders' input in terms of providing information in the planning phase of the census is crucial in making it a success. However, the information provided should be within the scope of the census.
Individual particulars Section A: Demographics Section B: Migration Section C: General Health and Functioning Section D: Parental Survival and Income Section E: Education Section F: Employment Section G: Fertility (Women 12-50 Years Listed) Section H: Housing, Household Goods and Services and Agricultural Activities Section I: Mortality in the Last 12 Months The Household Questionnaire is available in Afrikaans; English; isiZulu; IsiNdebele; Sepedi; SeSotho; SiSwati;Tshivenda;Xitsonga
The Transient and Tourist Hotel Questionnaire (English) is divided into the following sections:
Name, Age, Gender, Date of Birth, Marital Status, Population Group, Country of birth, Citizenship, Province.
The Questionnaire for Institutions (English) is divided into the following sections:
Particulars of the institution
Availability of piped water for the institution
Main source of water for domestic use
Main type of toilet facility
Type of energy/fuel used for cooking, heating and lighting at the institution
Disposal of refuse or rubbish
Asset ownership (TV, Radio, Landline telephone, Refrigerator, Internet facilities)
List of persons in the institution on census night (name, date of birth, sex, population group, marital status, barcode number)
The Post Enumeration Survey Questionnaire (English)
These questionnaires are provided as external resources.
Data editing and validation system The execution of each phase of Census operations introduces some form of errors in Census data. Despite quality assurance methodologies embedded in all the phases; data collection, data capturing (both manual and automated), coding, and editing, a number of errors creep in and distort the collected information. To promote consistency and improve on data quality, editing is a paramount phase in identifying and minimising errors such as invalid values, inconsistent entries or unknown/missing values. The editing process for Census 2011 was based on defined rules (specifications).
The editing of Census 2011 data involved a number of sequential processes: selection of members of the editing team, review of Census 2001 and 2007 Community Survey editing specifications, development of editing specifications for the Census 2011 pre-tests (2009 pilot and 2010 Dress Rehearsal), development of firewall editing specifications and finalisation of specifications for the main Census.
Editing team The Census 2011 editing team was drawn from various divisions of the organisation based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors. Census 2011 editing team was drawn from various divisions of the organization based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors.
The Census 2011 questionnaire was very complex, characterised by many sections, interlinked questions and skipping instructions. Editing of such complex, interlinked data items required application of a combination of editing techniques. Errors relating to structure were resolved using structural query language (SQL) in Oracle dataset. CSPro software was used to resolve content related errors. The strategy used for Census 2011 data editing was implementation of automated error detection and correction with minimal changes. Combinations of logical and dynamic imputation/editing were used. Logical imputations were preferred, and in many cases substantial effort was undertaken to deduce a consistent value based on the rest of the householdâs information. To profile the extent of changes in the dataset and assess the effects of imputation, a set of imputation flags are included in the edited dataset. Imputation flags values include the following: 0 no imputation was performed; raw data were preserved 1 Logical editing was performed, raw data were blank 2 logical editing was performed, raw data were not blank 3 hot-deck imputation was performed, raw data were blank 4 hot-deck imputation was performed, raw data were not blank
Independent monitoring and evaluation of Census field activities Independent monitoring of the Census 2011 field activities was carried out by a team of 31 professionals and 381 Monitoring
The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.
Household
Sample survey data [ssd]
The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.
Computer Assisted Telephone Interview [cati]
The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:
A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.
DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.
Variable naming and labeling: âą Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question. âą Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable. âą âOther, specifyâ variables were named similarly to their related question, with â_otherâ appended to the name. âą Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.
Variable formatting:
âą Variables were formatted as their object type (string, integer, decimal, time, date, or datetime).
âą Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response.
âą Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates.
âą Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
âą Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. âą Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data).
âą The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.
Backcheck data review: Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.
The following data quality checks were completed: âą Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of âOther, specifyâ, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal. âą Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions. âą Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations. âą Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily. âą Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1. Field Supervisors called these households and administered a short survey, including (i) identifying the same respondent; (ii) determining the respondent's position within the household; (iii) confirming that a member of the the data collection team had completed the interview; and (iv) a few questions from the original survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
http://creativecommons.org/licenses/http://creativecommons.org/licenses/
The ISOTOPE database stores compiled age and isotopic data from a range of published and unpublished (GA and non-GA) sources. This internal database is only publicly accessible through the webservices given as links on this page. This data compilation includes sample and bibliographic links. The data structure currently supports summary ages (e.g., U-Pb and Ar/Ar) through the INTERPRETED_AGES tables, as well as extended system-specific tables for Sm-Nd, Pb-Pb, Lu-Hf and O- isotopes. The data structure is designed to be extensible to adapt to evolving requirements for the storage of isotopic data. ISOTOPE and the data holdings were initially developed as part of the Exploring for the Future (EFTF) program. During development of ISOTOPE, some key considerations in compiling and storing diverse, multi-purpose isotopic datasets were developed: 1) Improved sample characterisation and bibliographic links. Often, the usefulness of an isotopic dataset is limited by the metadata available for the parent sample. Better harvesting of fundamental sample data (and better integration with related national datasets such as Australian Geological Provinces and the Australian Stratigraphic Units Database) simplifies the process of filtering an isotopic data compilation using spatial, geological and bibliographic criteria, as well as facilitating âauditsâ targeting missing isotopic data. 2) Generalised, extensible structures for isotopic data. The need for system-specific tables for isotopic analyses does not preclude the development of generalised data-structures that reflect universal relationships. GA has modelled relational tables linking system-specific Sessions, Analyses, and interpreted data-Groups, which has proven adequate for all of the Isotopic Atlas layers developed thus far. 3) Dual delivery of âderivedâ isotopic data. In some systems, it is critical to capture the published data (i.e. isotopic measurements and derived values, as presented by the original author) and generate an additional set of derived values from the same measurements, calculated using a single set of reference parameters (e.g. decay constant, depleted-mantle values, etc.) that permit ânormalisedâ portrayal of the data compilation-wide. 4) Flexibility in data delivery mode. In radiogenic isotope geochronology (e.g. U-Pb, Ar-Ar), careful compilation and attribution of âinterpreted agesâ can meet the needs of much of the user-base, even without an explicit link to the constituent analyses. In contrast, isotope geochemistry (especially microbeam-based methods such as Lu-Hf via laser ablation) is usually focused on the individual measurements, without which interpreted âsample-averagesâ have limited value. Data delivery should reflect key differences of this kind.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset offers a comprehensive overview of rural technological trajectories and variants at the municipal level across the Brazilian Northern Region between 2006 and 2017. Analyzing the data will provide valuable insights into how different municipalities employ their particular production modes, Federative States, and Mesoregions in their technological trajectory, as well as whether peasantry is present in those areas. Not only does this dataset offer detailed views from multiple dimensions of rural production, but it also provides an understanding of how technology has developed over time in tropical regions. Investigating this data could open up various possibilities for recognizing patterns across multiple variables relating to innovation, growth opportunities, and other relevant economic phenomena that are laid out within a comprehensive visual narrative which is both thought-provoking and eye-catching
For more datasets, click here.
- đš Your notebook can be here! đš!
This dataset provides information on the technological trajectories and variants in the Brazilian Northern Region at the municipal level. With this data, users can gain insights into the production modes and technological trajectories of various municipalities in Northern Brazil. Here is how you can use this dataset:
- Identify what production mode is most common in the region: Through exploring and analyzing the data, users can identify which production mode is most prevalent in each municipality of North Brazil. This can help identify trends or preferences within each area as well as differentiating between rural/agricultural areas and urban/industrialized regions of North Brazil.
- Analyze technological trajectory trends: Comparing data from 2006 to 2017, users can determine if a municipality has shifted its technology trajectory over time or if it has stayed consistent throughout both years (i.e., no changes have occurred). Also, by examining more than one yearâs worth of data side-by-side allows for easy comparison between different technology trajectories amongst multiple districts as well as over time within a single district/municipality..
- Analyze Peasantry Presence: By looking into which federative states have peasantry present versus those who do not, it reveals additional insight when combined with other variables regarding Production Mode and Technological Trajectory to see any correlations between them all at once such as observing how certain production modes make certain municipals more predisposed to having peasants present than others or vice versa due to historical factors such as colonization date etc..
- Gain Insight into Mesoregion: The Mesoregion column provides many interesting insights such as providing geographic insight about which states contain multiple municipalities with similar technology trajectory types which allows for an easier view of regional differences rather than just focusing on single state differences thus allowing for a far better comparative analysis then from focusing mainly on a statewide level only since mesoregions often times span across multiple states due to differing size constraints
- Identifying and mapping technological trajectories for rural development in the Brazilian Northern Region over time.
- Determining economic and geographical disparities in the region using variables such as peasantry, production mode, and technological trajectory.
- Analyzing how changes in technology, demographics, geography, and politics have influenced peasant livelihoods over time
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: 2006_NorthRegion_TechVariants.csv | Column name | Description | |:---------------------------------|:----------------------------------------------------------------| | Municipality | Name of the municipality. (String) ...
.
For Update Hyundai GPS Click On Link: đ "https://navisolve.com/">Hyundai GPS Map Update
.
Modern vehicles like Hyundai are increasingly integrating advanced technology to enhance the driving experience, and one of the most vital tools in this evolution is the in-built GPS navigation system. A well-functioning GPS is essential not only for convenience but also for safety. It guides drivers through unfamiliar roads, provides alternate routes during traffic jams, and offers essential data like nearby fuel stations or points of interest. However, for this system to remain accurate and useful, the maps it relies on must be regularly updated.
Many drivers are unaware that the roads, businesses, and even speed limits change over time, rendering old maps less reliable. Updating the GPS maps in your Hyundai ensures that your navigation system remains accurate, efficient, and trustworthy. This guide walks you through the process of updating your Hyundaiâs GPS system, simplifying what may initially seem like a technical task.
Understanding Your Hyundai GPS System
Hyundai vehicles are equipped with a navigation system that runs on a dedicated infotainment interface. The GPS data is stored either on an internal hard drive, a memory card, or in some models, an SD card. This system is designed to support periodic updates, typically provided by Hyundai or through the official Hyundai navigation update portal.
Before attempting an update, itâs important to determine the type of system your vehicle uses. Hyundai models vary by year and region, and their infotainment units may differ. Some newer models may even support wireless updates or Over-the-Air (OTA) services, eliminating the need for manual installations. For most vehicles, however, the process requires the use of a computer and a USB drive or SD card.
Preparing for the Update
The first step in updating your Hyundai GPS map is preparation. You will need access to a computer, a high-speed internet connection, and a USB drive or SD card with enough storage capacity. Typically, a drive with at least 16GB of space is required to download and transfer the necessary files.
Ensure that your vehicle is parked in a safe location with the engine running or in accessory mode when you initiate the transfer process later. Interruptions during data installation can cause system errors, so itâs best to allow uninterrupted time for the process.
Accessing the Update Software
Hyundai uses a dedicated platform to manage map updates for its vehicles. Through this platform, users can download a software tool compatible with their computerâs operating system. Once installed, the tool will guide you through selecting your vehicle model and the specific infotainment version it uses.
This software is intuitive and user-friendly, designed for both tech-savvy users and those less familiar with digital tools. After identifying your vehicle, the tool will detect the appropriate update file, which can be quite large depending on the region and version.
Downloading the Map Update
Once the software determines the correct update package, the next step is downloading the map files. This can take some time, depending on your internet speed and the size of the files. Itâs recommended to avoid multitasking on your computer during this time to ensure a smooth download.
After the files are downloaded, the tool will format the USB drive or SD card and transfer the update files to it. Be aware that formatting the drive will erase all existing data, so ensure you have backed up any important files beforehand.
Installing the Update in Your Hyundai Vehicle
With the update files ready, return to your vehicle and insert the USB drive or SD card into the designated port. On the infotainment screen, navigate to the settings or system update section. The system should automatically detect the update files and prompt you to begin the installation.
Follow the on-screen instructions carefully. The installation process can take anywhere from 30 minutes to over an hour, depending on the system and the update size. During this time, do not turn off the engine or remove the USB drive/SD card. Any interruption could potentially corrupt the update or damage the navigation system.
Once the update is complete, the system will typically reboot and apply the new map data. Itâs a good idea to check the version number in your settings menu afterward to confirm that the update was successful.
After the Update: What to Expect
With the latest map data installed, your Hyundai GPS system should now reflect recent road changes, new routes, and updated points of interest. You might also notice improved system performance or added features, especially if the update includes firmware enhancements.
Drivers often report smoother navigation experiences and fewer instances of being misdirected after a successful update. Features like real-time traffic data and speed limit alerts may also perform more accurately with updated maps.
Maintaining Your GPS System
Hyundai typically recommends checking for updates at least once a year, though more frequent checks can be beneficial if you frequently drive through new developments or construction zones. Keeping your GPS system current not only improves navigation but also adds to the resale value of your vehicle, as a properly maintained infotainment system is a key selling point.
Some newer Hyundai models offer automatic updates via a connected service, which simplifies the process even further. However, for vehicles that rely on manual updates, following this guide will ensure that you are always driving with the most reliable navigation data available.
Final Thoughts
Updating your Hyundai GPS map doesnât have to be a daunting task. With the right tools and a bit of preparation, it can be a straightforward process that significantly enhances your driving experience. In a world where roads are constantly changing and digital tools are becoming essential, keeping your vehicleâs navigation system up-to-date is not just a matter of convenienceâitâs a necessity. Taking the time to ensure your maps are current can save you time, reduce stress, and keep you safe on the road.
Read More:-
"https://gpsmapupdats.readthedocs.io/en/latest/">GPS Map Update
"https://garmin-gps.readthedocs.io/en/latest/">Garmin GPS Map Update
"https://tomtom-gps.readthedocs.io/en/latest/">TomTom GPS Map Update
"https://rand-mcnally-gps-map-update.readthedocs.io/en/latest/">Rand Mcnally GPS Map Update
"https://hyundaigpsmapupdate.readthedocs.io/en/latest/">Hyundai GPS Map Update
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractPopulation surveys are vital for wildlife management, yet traditional methods are typically effort-intensive, leading to data gaps. Modern technologies â such as drones â facilitate field surveys but increase the data analysis burden. Citizen Science (CS) can alleviate this issue by engaging non-specialists in data collection and analysis. We evaluated this approach for population monitoring using the endangered GalĂĄpagos marine iguana as a case study, assessing citizen scientistsâ ability to detect and count animals in aerial images. Comparing against a Gold Standard dataset of expert counts in 4345 images, we explored optimal aggregation methods from CS inputs and evaluated the accuracy of CS counts. During three phases of our project â hosted on Zooniverse.org â over 13,000 volunteers made 1,375,201 classifications from 57,838 images; each being independently classified up to 30 times. Volunteers achieved 68% to 94% accuracy in detecting iguanas, with more false negatives than false positives. Image quality strongly influenced accuracy; by excluding data from suboptimal pilot-phase images, volunteers counted with 91% to 92% of accuracy. For detecting iguanas, the standard âmajority vote' aggregation approach (where the answer selected is that given by the majority of individual inputs) produced less accurate results than when a minimum threshold of five (from the total independent classifications) was used. For counting iguanas, HDBSCAN clustering yielded the best results. We conclude that CS can accurately identify and count marine iguanas from drone images though there is a tendency to underestimate. CS-based data analysis is still resource-intensive, underscoring the need to develop a Machine Learning approach.MethodsWe created a citizen science project, named Iguanas from Above, in Zooniverse.org. There, we uploaded 'sliced' images from drone imagery belonging to several colonies of the GalĂĄpagos marine iguana. Citizen scientists (CS) were asked to classify the images doing two tasks: First to say yes or no for iguana presence in the image and second to count the individuals when present. Each image was classified by 20 or 30 volunteers. Once all the images, corresponding to three phases launched were classified, we downloaded the data from the Zooniverse portal and used the Panoptes Aggregation python package to extract and aggregate CS data (source code: https://github.com/cwinkelmann/iguanas-from-above-zooniverse).We ramdomly selected 5â10% of all the images to create a Gold Standard (GS) dataset. Three experts from the research team identified presence and absence of marine iguanas in the images and count them. The concensus answers are presented in this dataset and is referred as expert data. The aggregated CS data from Task 1 (a total number of yes and no answers per image) was analyzed as accepted for iguana presence when 5 or more volunteers (from the 20â30) selected yes (a minimum threshold rule), otherwise absence was accepted. Then, we compared all CS accepted answers against the expert data, as correct or incorrect, and calculated a percentage of CS accuracy regarding marine iguana detection.For Task 2, we selected all the images identied by the volunteers to have iguanas with this minimum threshold rule and aggregate (summarize) all classifications into one value (count) per image by using the statistical metrics median and mode and the spatial clustering methods DBSCAN and HDBSCAN. The rest of the images obtained 0 counts. CS data was incorporated into this dataset. We then compared total counts in this GS dataset calculated by the expert and all the aggregating methods used in terms of percentages of agreement towards the expert data. These percentages showed CS accuracy regarding marine iguana counting. We also investigated number of marine iguanas under and overestimated with all aggregating methods.Finally, by applying generalized linear models, we used this dataset to explore statistical differences among the different methods used to count marine iguanas (expert, median, mode and HDBSCAN) in the images and how the factors: phase analyzed, quality of the imges (assessed by the experts) and number of marine iguanas present in the image, could affect CS accuracy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises 153 subfolders within a primary directory named data, derived from 85 participants. Each participant typically contributes 2â3 subfolders, contingent on the completeness and quality of their M-mode echocardiography (UCG) recordings. Subfolder names follow the format: hdata + SubjectID + EJ/XJ/ZJ to denote the specific cardiac region captured in the ultrasound data:EJ denotes M-mode imaging of the mitral valve, XJ denotes M-mode imaging of the left ventricle, and ZJ denotes M-mode imaging of the aortic valve.For instance, a participant with identifier â001â may have subfolders named hdata1EJ, hdata1XJ, and/or hdata1ZJ, corresponding to each available M-mode echocardiographic segment. Each subfolder contains five distinct files, described in detail below.1 BCG J-peak file(1) File name: hdata+subjectID+EJ/XJ/ZJ_BCG.csv(2) Content: J-peak positions in the BCG signal, presented in two columns:(3) The first column provides the raw data point index.(4) The second column specifies the corresponding time (in seconds) for each J-peak.2 ECG R-peak file(1) File name: hdata+subjectID+EJ/XJ/ZJ_ECG.csv(2) Content: R-peak positions in the ECG signal, also in two columns:(3) The first column provides the raw data point index.(4) The second column specifies the corresponding time (in seconds) for each R-peak.3 Ultrasound video(1) File name: hdata+subjectID+EJ/XJ/ZJ_UCG.AVI(2) Content: An AVI-format video of the simultaneously acquired M-mode echocardiogram. The suffix EJ, XJ, or ZJ indicates whether the imaging targeted the mitral valve, left ventricle, or aortic valve, respectively.4 Signal data(1) File name: signal.csv(2) Content: Three columns of time-series data sampled at 100 Hz. Raw BCG signal (Column 1).ECG data (Lead V2 or another designated lead) (Column 2). Denoised BCG signal (Column 3), derived using the Enhanced Singular Value Thresholding (ESVT) algorithm.5 Signal visualization(1) File name: signal.pdf(2) Content: A graphical representation of the signals from signal.csv. This file facilitates quick inspection of waveform alignment and overall signal quality.In addition to the data directory, an Additional_info folder provides participant demographic and clinical details. Each row in subject_info.csv corresponds to an individual participant, listing their ID, sex, weight, height, age, heart rate, ejection fraction(EF) (%). These parameters establish an informative link between each participantâs anthropometric profile, cardiac function metrics, and the corresponding BCG, ECG, and ultrasound data.
The Addis Ababa Industrial Survey 1993 is the first wave of a panel data set on a sample of firms within the Ethiopian manufacturing sector, all based in the Addis Ababa region. The first round of the survey was undertaken between September - December 1993. The questionnaire structure and types of data collected were designed to be consistent with other African manufacturing sector surveys carried out under the Regional Program on Enterprise Development (RPED) organised by the World Bank. The survey covers 220 firms that were selected on a random basis from manufacturing establishments in the Addis Ababa region, of which 30 are public enterprises. The firms constitute a panel which was intended to be broadly representative of the size distribution of firms across the major sectors of Ethiopia's manufacturing industry. These sectors include food processing, textiles and garments, paper products and furniture, metal products and machinery.
Data on each firm was collected at two levels: firm level information relating to the years 1992/1993; and for each firm, information on a sub-sample of their workers & apprentices for 1993.
The dataset is complete, apart from some additional data files relating to firmsâ financial structure and performance (Section 5 of Questionnaire) which are still in the process of compilation and will be added to this first release as soon as possible.
The City of Addis Ababa
Enterprise/establishment
The survey covered a sample of 220 firms within the Ethiopian manufacturing sector, all based in the Addis Ababa region.
Sample survey data [ssd]
Face-to-face [f2f]
The dataset presented here has been extracted from a detailed questionnaire conducted with the owners/ senior managers and, for relevant sections, workers of the sampled manufacturing firms.
The original questionnaire was designed by a team from the World Bank, as part of the Regional Program on Enterprise Development (RPED) and therefore the questionnaire structure and types of data collected are consistent with other African manufacturing sector surveys carried out by RPED. The overall questionnaire has been divided into a number of sections, grouping questions related to different aspects of firm-level structure and performance and also a section of supplementary labour market information gathered from interviews with a sample of workers within each firm.
These sections are organised as follows in the Wave I questionnaire:
Entrepreneurship Questionnaire General Firm Questionnaire Technology Questionnaire Labour Markets Questionnaire Appendix to Labour Markets Questionnaire: Survey for a Sub-Sample of Workers Financial Markets & Contractual Relations Questionnaire Dispute Resolution Questionnaire Infrastructure Questionnaire Regulation Questionnaire Business Support Services Questionnaire Entrepreneurial Strategies Questionnaire
The data files have been made available by CSAE in the form in which they were received from the survey organisers, without any additional work having been undertaken to clean the data and check for consistency. It is suggested, therefore that potential users run their own basic consistency checks on the elements of the dataset in which they are interested before using the data in further analytical work.
DataFirst has assisted the usability of the dataset by merging the 41 original files into one data file, using the unique firm id number, and providing variable and value labels. The original data files are also available as licensed data from info@data1st.org
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
This dataset includes raw and processed data on glycerol, obtained through Nuclear Magnetic Resonance (NMR) and Gas Chromatography-Mass Spectrometry (GCMS) analyses. The dataset is intended to analyze the purity of glycerol produced from animal and plant-based glycerol.NMR Data:Technique Used: Proton (1H) NMR and Carbon (13C) NMR.Instrumentation: Bruker NMR 500 with proton measurements at 500 MHz and 24°C to determine its carbon structure.Sample Preparation: Solvent used is deuterated methanolSpectral Data:Chemical Shifts: Values in ppm for prominent resonances.Coupling Constants: J-values for significant couplings.Integration Values: Relative integrals for proton environments.Processed Data: Standard glycerol NMR spectra (PNG).GCMS Data:Technique Used: Gas Chromatography coupled with Mass Spectrometry.Instrumentation: Agilent 7890B GC with an Agilent 5977A Series GC/MSD system. The column used was Agilent J&W DB-Wax UI (30 m 0.25 mm 0.25 ”m). Helium was used as the carrier gas with an inlet temperature of 250 C in the split mode (20: 10). The GC oven started at 40 °C and was raised to 65°C at the rate of 20°C/min without holding. Then, it was raised to 250°C at the rate of 50°C/min. No solvent delay was detected. The mass spectrum was obtained by ionization and the detector temperature was set at 230 °C and operated in scan mode from 29-400 amu.Sample Preparation: Each sample was injected with a volume of 1 mL examined in triplicate.Chromatographic Data:Retention Times: 9.929 to 9.938s with single peak detected.Mass Spectral Data:Mass-to-Charge Ratios (m/z): m/z 61, m/z 43, m/z 44, m/z 45, m/z 29, m/z 15 and m/z 31Processed Data: Chromatogram of animal and plant-based glycerol (PNG).
https://user-images.githubusercontent.com/91852182/147305077-8b86ec92-ed26-43ca-860c-5812fea9b1d8.gif" alt="ezgif com-gif-maker">
Self-drivi cars has become a trending subject with a significant improvement in the technologies in the last decade. The project purpose is to train a neural network to drive an autonomous car agent on the tracks of Udacityâs Car Simulator environment. Udacity has released the simulator as an open source software and enthusiasts have hosted a competition (challenge) to teach a car how to drive using only camera images and deep learning. Driving a car in an autonomous manner requires learning to control steering angle, throttle and brakes. Behavioral cloning technique is used to mimic human driving behavior in the training mode on the track. That means a dataset is generated in the simulator by user driven car in training mode, and the deep neural network model then drives the car in autonomous mode. Ultimately, the car was able to run on Track 1 generalizing well. The project aims at reaching the same accuracy on real time data in the future.https://user-images.githubusercontent.com/91852182/147298831-225740f9-6903-4570-8336-0c9f16676456.png" alt="6">
Udacity released an open source simulator for self-driving cars to depict a real-time environment. The challenge is to mimic the driving behavior of a human on the simulator with the help of a model trained by deep neural networks. The concept is called Behavioral Cloning, to mimic how a human drives. The simulator contains two tracks and two modes, namely, training mode and autonomous mode. The dataset is generated from the simulator by the user, driving the car in training mode. This dataset is also known as the âgoodâ driving data. This is followed by testing on the track, seeing how the deep learning model performs after being trained by that user data.
https://user-images.githubusercontent.com/91852182/147298261-4d57a5c1-1fda-4654-9741-2f284e6d0479.png" alt="1">
The problem is solved in the following steps:
Technologies that are used in the implementation of this project and the motivation behind using these are described in this section.
TensorFlow: This an open-source library for dataflow programming. It is widely used for machine learning applications. It is also used as both a math library and for large computation. For this project Keras, a high-level API that uses TensorFlow as the backend is used. Keras facilitate in building the models easily as it more user friendly.
Different libraries are available in Python that helps in machine learning projects. Several of those libraries have improved the performance of this project. Few of them are mentioned in this section. First, âNumpyâ that provides with high-level math function collection to support multi-dimensional metrices and arrays. This is used for faster computations over the weights (gradients) in neural networks. Second, âscikit-learnâ is a machine learning library for Python which features different algorithms and Machine Learning function packages. Another one is OpenCV (Open Source Computer Vision Library) which is designed for computational efficiency with focus on real-time applications. In this project, OpenCV is used for image preprocessing and augmentation techniques.
The project makes use of Conda Environment which is an open source distribution for Python which simplifies package management and deployment. It is best for large scale data processing. The machine on which this project was built, is a personal computer.
CNN is a type of feed-forward neural network computing system that can be used to learn from input data. Learning is accomplished by determining a set of weights or filter values that allow the network to model the behavior according to the training data. The desired output and the output generated by CNN initialized with random weights will be different. This difference (generated error) is backpropagated through the layers of CNN to adjust the weights of the neurons, which in turn reduces the error and allows us produce output closer to the desired one.
CNN is good at capturing hierarchical and spatial data from images. It utilizes filters that look at regions of an input image with a defined window size and map it to some output. It then slides the window by some defined stride to other regions, covering the whole image. Each convolution filter layer thus captures the properties of this input image hierarchically in a series of subsequent layers, capturing the details like lines in image, then shapes, then whole objects in later layers. CNN can be a good fit to feed the images of a dataset and classify them into their respective classes.
Another type of layers sometimes used in deep learning networks is a Time- distributed layer. Time-Distributed layers are provided in Keras as wrapper layers. Every temporal slice of an input is applied with this wrapper layer. The requirement for input is that to be at least three-dimensional, first index can be considered as temporal dimension. These Time-Distributed can be applied to a dense layer to each of the timesteps, independently or even used with Convolutional Layers. The way they can be written is also simple in Keras as shown in Figure 1 and Figure 2.
https://user-images.githubusercontent.com/91852182/147298483-4f37a092-7e71-4ce6-9274-9a133d138a4c.png" alt="2">
Fig. 1: TimeDistributed Dense layer
https://user-images.githubusercontent.com/91852182/147298501-6459d968-a279-4140-9be3-2d3ea826d9f6.png" alt="3">
Fig. 2: TimeDistributed Convolution layer
We will first download the simulator to start our behavioural training process. Udacity has built a simulator for self-driving cars and made it open source for the enthusiasts, so they can work on something close to a real-time environment. It is built on Unity, the video game development platform. The simulator consists of a configurable resolution and controls setting and is very user friendly. The graphics and input configurations can be changed according to user preference and machine configuration as shown in Figure 3. The user pushes the âPlay!â button to enter the simulator user interface. You can enter the Controls tab to explore the keyboard controls, quite similar to a racing game which can be seen in Figure 4.
https://user-images.githubusercontent.com/91852182/147298708-de15ebc5-2482-42f8-b2a2-8d3c59fceff4.png" alt=" 4">
Fig. 3: Configuration screen
https://user-images.githubusercontent.com/91852182/147298712-944e2c2d-e01d-459b-8a7d-3c5471bea179.png" alt="5">
Fig. 4: Controls Configuration
The first actual screen of the simulator can be seen in Figure 5 and its components are discussed below. The simulator involves two tracks. One of them can be considered as simple and another one as complex that can be evident in the screenshots attached in Figure 6 and Figure 7. The word âsimpleâ here just means that it has fewer curvy tracks and is easier to drive on, refer Figure 6. The âcomplexâ track has steep elevations, sharp turns, shadowed environment, and is tough to drive on, even by a user doing it manually. Please refer Figure 6. There are two modes for driving the car in the simulator: (1) Training mode and (2) Autonomous mode. The training mode gives you the option of recording your run and capturing the training dataset. The small red sign at the top right of the screen in the Figure 6 and 7 depicts the car is being driven in training mode. The autonomous mode can be used to test the models to see if it can drive on the track without human intervention. Also, if you try to press the controls to get the car back on track, it will immediately notify you that it shifted to manual controls. The mode screenshot can be as seen in Figure 8. Once we have mastered how the car driven controls in simulator using keyboard keys, then we get started with record button to collect data. We will save the data from it in a specified folder as you can see
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using CERNER records of pregnancies monitored at St. Mary's Hospital in London from April 2016 to November 2019, we carried out a retrospective observational research. 26063 patients were found in the initial search results, with the following factors: Postcode, height, weight, BMI at booking, ethnicity (self-reported), parity, offer of a glucose tolerance test, results of the test (0 minutes and 120 minutes after a 75g glucose load), mode of delivery, estimated total blood loss, gestational age, neonatal birthweight, admission to a SCBU, length of stay after delivery, foetal sex, and stillbirth are some of the other factors taken into account. Prior to analysis, patients having missing values for one or more of the important variables were deleted from the dataset. We did not make an effort to impute missing data. When reexamining the original patient data was not possible, significantly outlying results were adjusted, and datasets were then eliminated. Unit measurement inconsistencies were fixed.
THIS DATA SET CONTAINS THE BEST ESTIMATES OF THE TOTAL ION DENSITY FROM VOYAGER 2 AT JUPITER IN THE PLS VOLTAGE RANGE (10-5950 EV/Q). IT IS CALCULATED USING THE METHOD OF MCNUTT ET AL. (1981) WHICH TO FIRST ORDER CONSISTS OF TAKING THE TOTAL MEASURED CURRENT AND DIVIDING BY THE COLLECTOR AREA AND PLASMA BULK VELOCITY. THIS METHOD IS ONLY ACCURATE FOR HIGH MACH NUMBER FLOWS DIRECTLY INTO THE DETECTOR, AND MAY RESULT IN UNDERESTIMATES OF THE TOTAL DENSITY OF A FACTOR OF 2 IN THE OUTER MAGNETOSPHERE. THUS ABSOLUTE DENSITIES SHOULD BE TREATED WITH CAUTION, BUT DENSITY VARIATIONS IN THE DATA SET CAN BE TRUSTED. THE LOW RESOLUTION MODE DENSITY IS USED AT ALL TIMES EXCEPT DAY 190 2100-2200 WHEN THE LARGER OF THE HIGH AND LOW RESOLUTION MODE DENSITIES IN A 96 SEC PERIOD IS USED. COROTATION IS ASSUMED INSIDE L=17.5, AND A CONSTANT VELOCITY COMPONENT OF 200 KM/S INTO THE D CUP IS USED OUTSIDE OF THIS. THESE ARE THE DENSITIES GIVEN IN THE MCNUTT ET AL. (1981) PAPER CORRECTED BY A FACTOR OF 1.209 (.9617) FOR DENSITIES OBTAINED FROM THE SIDE (MAIN) SENSOR. THIS CORRECTION IS DUE TO A BETTER CALCULATION OF THE EFFECTIVE AREA OF THE SENSORS. DATA FORMAT: COLUMNS 1-6 ARE TIME (YEAR, DAY, HOUR, MIN, SEC, MSEC) COLUMN 7 IS THE MOMENT DENSITY IN CM-3. EACH ROW HAS FORMAT (6I4, E12.3). VALUES OF 1.E32 INDICATE THAT THE PARAMETER COULD NOT BE OBTAINED FROM THE DATA USING THE STANDARD ANALYSIS TECHNIQUE. ADDITIONAL INFORMATION ABOUT THIS DATASET AND THE INSTRUMENT WHICH PRODUCED IT CAN BE FOUND ELSEWHERE IN THIS CATALOG. AN OVERVIEW OF THE DATA IN THIS DATA SET CAN BE FOUND IN MCNUTT ET AL. (1981) AND A COMPLETE INSTRUMENT DESCRIPTION CAN BE FOUND IN BRIDGE (1977).
This data set contains the best estimates of the total ion density from Voyager 2 at Jupiter in the PLS voltage range (10-5950 eV/Q). It is calculated using the method of which to first order consists of taking the total measured current and dividing by the collector area and plasma bulk velocity. This method is only accurate for high mach number flows directly into the detector, and may result in underestimates of the total density of a factor of 2 in the outer magnetosphere. Thus absolute densities should be treated with caution, but density variations in the data set can be trusted. The low resolution mode density is used at all times except day 190 2100-2200 when the larger of the high and low resolution mode densities in a 96 sec period is used. Corotation is assumed inside L=17.5, and a constant velocity component of 200 km/s into the D cup is used outside of this. Data format: column 1 is time (yyyy-mm-ddThh:mm:ss.sssZ), column 2 is the moment density in cm^-3. Each row has format (a24, 1x, 1pe9.2). Values of -9.99e+10 indicate that the parameter could not be obtained from the data using the standard analysis technique. Additional information about this data set and the instrument which produced it can be found elsewhere in this catalog.
The following description applies to the Wideband Data (WBD) Plasma Wave Receivers on all four Cluster satellites, each satellite being uniquely identified by its number (1 through 4) or its given name (Rumba, Salsa, Samba, Tango, respectively). High time resolution calibrated waveform data sampled in one of 3 frequency bands in the range 0-577 kHz along one axis using either an electric field antenna or a magnetic search coil sensor. The dataset also includes instrument mode, data quality and the angles required to orient the measurement with respect to the magnetic field and to the GSE coordinate system. The AC electric field data are obtained by using one of the two 88m spin plane electric field antennas of the EFW (Electric Fields and Waves) instrument as a sensor. The AC magnetic field data are obtained by using one of the two search coil magnetometers (one in the spin plane, the other along the spin axis) of the STAFF (Spatio-Temporal Analysis of Field Fluctuations) instrument as a sensor. The WBD data are obtained in one of three filter bandwidth modes: (1) 9.5 kHz, (2) 19 kHz, or (3) 77 kHz. The minimum frequency of each of these three frequency bands can be shifted up (converted) from the default 0 kHz base frequency by 125.454, 250.908 or 501.816 kHz. The time resolution of the data shown in the plots is determined from the WBD instrument mode. The highest time resolution data (generally the 77 kHz bandwidth mode) are sampled at 4.6 microseconds in the time domain (~4.7 milliseconds in the frequency domain using a standard 1024 point FFT). The lowest time resolution data (generally the 9.5 kHz bandwidth mode) are sampled at 36.5 microseconds in the time domain (~37.3 milliseconds in the frequency domain using a standard 1024 point FFT). The availability of these files depends on times of DSN and Panska Ves ground station telemetry downlinks. A list of the status of the WBD instrument on each spacecraft, the telemetry time spans, operating modes and other details are available under Science Data Availability on the University of Iowa Cluster WBD web site at http://www- pw.physics.uiowa.edu/cluster/ and through the documentation section of the Cluster Science Archive (CSA) (https://www.cosmos.esa.int/web/csa/documentation). Details on Cluster WBD Interpretation Issues and Caveats can be found at http://www- pw.physics.uiowa.edu/cluster/ by clicking on the links next to the Caution symbol in the listing on the left side of the web site. These documents are also available from the Documentation section of the CSA website. For further details on the Cluster WBD data products see Pickett, J.S., et al., "Cluster Wideband Data Products in the Cluster Active Archive" in The Cluster Active Archive, 2010, Springer-Verlag, pp 169-183, and the Cluster WBD User Guide archived at the CSA website in the Documentation section. ... CALIBRATION: ... The procedure used in computing the calibrated Electric Field and Magnetic Field values found in this file can be obtained from the Cluster WBD Calibration Report archived at the CSA website in the Documentation section. Because the calibration was applied in the time domain using simple equations the raw counts actually measured by the WBD instrument can be obtained by using these equations and solving for 'Raw Counts', keeping in mind that this number is an Integer ranging from 0 to 255. Since DC offset is a real number, the resultant when solving for raw counts will need to be converted to the nearest whole number. A sample IDL routine for reverse calibrating to obtain 'Raw Counts' is provided in the WBD Calibration Report archived at the CSA. ... CONVERSION TO FREQUENCY DOMAIN: ... In order to convert the WBD data to the frequency domain via an FFT, the following steps need to be carried out: 1) If Electric Field, first divide calibrated data values by 1000 to get V/m; 2) Apply window of preference, if any (such as Hann, etc.); 3) Divide data values by sqrt(2) to get back to the rms domain; 4) perform FFT (see Bandwidth variable notes for non-continuous modes and/or the WBD User Guide archived at the CSA); 5) divide by the noise bandwidth, which is equal to the sampling frequency divided by the FFT size (see table below for appropriate sampling frequency); 6) multiply by the appropriate constant for the window used, if any. These steps are more fully explained in the WBD Calibration Report archived at the CSA.... +--------------------------+ Bandwidth Sample Rate ----------- -------------- 9.5 kHz 27.443 kHz 19 kHz 54.886 kHz 77 kHz 219.544 kHz +--------------------------+ COORDINATE SYSTEM USED: ... One axis measurements made in the Antenna Coordinate System, i.e., if electric field measurement, it will either be Ey or Ez, both of which are in the spin plane of the spacecraft, and if magnetic field measurement, it will either be Bx, along the spin axis, or By, in spin plane. The user of WBD data should refer to the WBD User Guide, archived at the CSA, Section 5.4.1 and Figure 5.3 for a description of the three orientation angles provided in these files. Since WBD measurements are made along one axis only, these three angles provide the only means for orienting the WBD measurements with respect to a geocentric coordinate system and to the magnetic field direction ...
The Highway-Runoff Database (HRDB) was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration (FHWA) to provide planning-level information for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway runoff on the Nationâs receiving waters. The HRDB was assembled by using a Microsoft Access database application to facilitate use of the data and to calculate runoff-quality statistics with methods that properly handle censored-concentration data. This data release provides highway-runoff data, including information about monitoring sites, precipitation, runoff, and event-mean concentrations of water-quality constituents. The dataset was compiled from 37 studies as documented in 113 scientific or technical reports. The dataset includes data from 242 highway sites across the country. It includes data from 6,837 storm events with dates ranging from April 1975 to November 2017. Therefore, these data span more than 40 years; vehicle emissions and background sources of highway-runoff constituents have changed markedly during this time. For example, some of the early data is affected by use of leaded gasoline, phosphorus-based detergents, and industrial atmospheric deposition. The dataset includes 106,441 concentration values with data for 414 different water-quality constituents. This dataset was assembled from various sources and the original data was collected and analyzed by using various protocols. Where possible the USGS worked with State departments of transportation and the original researchers to obtain, document, and verify the data that was included in the HRDB. This new version (1.1.0) of the database contains software updates to provide data-quality information within the Graphical User Interface (GUI), calculate statistics for multiple sites in batch mode, and output additional statistics. However, inclusion in this dataset does not constitute endorsement by the USGS or the FHWA. People who use this data are responsible for ensuring that the data are complete and correct and that it is suitable for their intended purposes.