Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset provides detailed information on road surfaces from OpenStreetMap (OSM) data, distinguishing between paved and unpaved surfaces across the region. This information is based on road surface prediction derived from hybrid deep learning approach. For more information on Methods, refer to the paper
Roughly 0.3072 million km of roads are mapped in OSM in this region. Based on AI-mapped estimates the share of paved and unpaved roads is approximately 0.0337 and 0.0178 (in million kms), corressponding to 10.9845% and 5.798% respectively of the total road length in the dataset region. 0.2557 million km or 83.2175% of road surface information is missing in OSM. In order to fill this gap, Mapillary derived road surface dataset provides an additional 0.0017 million km of information (corressponding to 0.6463% of total missing information on road surface)
It is intended for use in transportation planning, infrastructure analysis, climate emissions and geographic information system (GIS) applications.
This dataset provides comprehensive information on road and urban area features, including location, surface quality, and classification metadata. This dataset includes attributes from OpenStreetMap (OSM) data, AI predictions for road surface, and urban classifications.
AI features:
pred_class: Model-predicted class for the road surface, with values "paved" or "unpaved."
pred_label: Binary label associated with pred_class
(0 = paved, 1 = unpaved).
osm_surface_class: Classification of the surface type from OSM, categorized as "paved" or "unpaved."
combined_surface_osm_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing the OSM surface tag, classified as "paved" or "unpaved."
combined_surface_DL_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing DL prediction pred_label
, classified as "paved" or "unpaved."
n_of_predictions_used: Number of predictions used for the feature length estimation.
predicted_length: Predicted length based on the DL model’s estimations, in meters.
DL_mean_timestamp: Mean timestamp of the predictions used, for comparison.
OSM features may have these attributes(Learn what tags mean here):
name: Name of the feature, if available in OSM.
name:en: Name of the feature in English, if available in OSM.
name:* (in local language): Name of the feature in the local official language, where available.
highway: Road classification based on OSM tags (e.g., residential, motorway, footway).
surface: Description of the surface material of the road (e.g., asphalt, gravel, dirt).
smoothness: Assessment of surface smoothness (e.g., excellent, good, intermediate, bad).
width: Width of the road, where available.
lanes: Number of lanes on the road.
oneway: Indicates if the road is one-way (yes or no).
bridge: Specifies if the feature is a bridge (yes or no).
layer: Indicates the layer of the feature in cases where multiple features are stacked (e.g., bridges, tunnels).
source: Source of the data, indicating the origin or authority of specific attributes.
Urban classification features may have these attributes:
continent: The continent where the data point is located (e.g., Europe, Asia).
country_iso_a2: The ISO Alpha-2 code representing the country (e.g., "US" for the United States).
urban: Binary indicator for urban areas based on the GHSU Urban Layer 2019. (0 = rural, 1 = urban)
urban_area: Name of the urban area or city where the data point is located.
osm_id: Unique identifier assigned by OpenStreetMap (OSM) to each feature.
osm_type: Type of OSM element (e.g., node, way, relation).
The data originates from OpenStreetMap (OSM) and is augmented with model predictions using images downloaded from Mapillary in combination with the GHSU Global Human Settlement Urban Layer 2019 and AFRICAPOLIS2020 urban layer.
This dataset is one of many HeiGIT exports on HDX. See the HeiGIT website for more information.
We are looking forward to hearing about your use-case! Feel free to reach out to us and tell us about your research at communications@heigit.org – we would be happy to amplify your work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh BD: Urban Land Area data was reported at 56,970.021 sq km in 2015. This records an increase from the previous number of 51,109.241 sq km for 2000. Bangladesh BD: Urban Land Area data is updated yearly, averaging 51,109.241 sq km from Dec 1990 (Median) to 2015, with 3 observations. The data reached an all-time high of 56,970.021 sq km in 2015 and a record low of 46,298.409 sq km in 1990. Bangladesh BD: Urban Land Area data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Bangladesh – Table BD.World Bank.WDI: Environmental: Land Use, Protected Areas and National Wealth. Urban land area in square kilometers, based on a combination of population counts (persons), settlement points, and the presence of Nighttime Lights. Areas are defined as urban where contiguous lighted cells from the Nighttime Lights or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons.;Center for International Earth Science Information Network (CIESIN)/Columbia University. 2013. Urban-Rural Population and Land Area Estimates Version 2. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://sedac.ciesin.columbia.edu/data/set/lecz-urban-rural-population-land-area-estimates-v2.;Sum;
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The "Bangladesh Flight Fare Dataset" is a synthetic dataset comprising 57,000 flight records tailored to represent air travel scenarios originating from Bangladesh. This dataset simulates realistic flight fare dynamics, capturing key factors such as airline operations, airport specifics, travel classes, booking behaviors, and seasonal variations specific to Bangladesh’s aviation market. It is designed for researchers, data scientists, and analysts interested in flight fare prediction, travel pattern analysis, or machine learning/deep learning applications. By combining real-world inspired statistical distributions and aviation industry standards, this dataset provides a robust foundation for exploring flight economics in a South Asian context.
This dataset aims to: - Facilitate predictive modeling of flight fares, with "Total Fare (BDT)" as the primary target variable. - Enable analysis of travel trends, including the impact of cultural festivals (e.g., Eid, Hajj) and booking timings on pricing. - Serve as a training resource for machine learning (ML) and deep learning (DL) models, with sufficient sample size (50,000) and feature diversity for generalization. - Provide a realistic yet synthetic representation of Bangladesh’s air travel ecosystem, blending domestic and international flight scenarios.
The dataset is synthetically generated using Python, with its methodology rooted in real-world aviation data and statistical principles. Below is a detailed breakdown of its construction:
Distance:
Purpose: Determines flight duration, aircraft type, and stopovers.
Source: Wikipedia - Haversine Formula.
Flight Duration:
Formula: Duration = max(d/s · U(0.9, 1.1), 0.5), where s is speed (300 km/h for <500 km, 600 km/h for 500-2000 km, 900 km/h for >2000 km), and U is uniform random variation.
Source: Speeds adjusted from World Atlas, ensuring realism (e.g., DAC to CGP ~45 minutes).
Fares:
Base Fares:
Domestic: Economy (2000-5000 BDT), Business (5000-10000 BDT), First Class (10000-15000 BDT).
International: Economy (5000-70000 BDT), Business (15000-150000 BDT), First Class (25000-300000 BDT).
Source: Derived from Trip.com and Expedia, e.g., DAC to LHR ~$380-600 (~41800-66000 BDT at 1 USD = 110 BDT).
Adjustments:
Seasonal multipliers (Regular: 1.0, Eid: 1.3, Hajj: 1.5, Winter: 1.2), per demand trends from Timeanddate.com.
Days Before Departure: 20% discount (60+ days), 10% discount (30-59 days), 20% surge (<5 days), per Skyscanner.
Taxes: Domestic: 200 BDT; International: 2000-6000 BDT + 15% base fare, per [Bangladesh Civil Aviation Authority](https://www.dgca.g...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh Land Use: Total Area data was reported at 147,570.000 sq km in 2022. This stayed constant from the previous number of 147,570.000 sq km for 2021. Bangladesh Land Use: Total Area data is updated yearly, averaging 148,460.000 sq km from Dec 1961 (Median) to 2022, with 62 observations. The data reached an all-time high of 148,460.000 sq km in 2013 and a record low of 147,570.000 sq km in 2022. Bangladesh Land Use: Total Area data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Bangladesh – Table BD.OECD.ESG: Environmental: Land Use: Non OECD Member: Annual.
The Cox’s Bazar Panel Survey (CBPS) was completed in August 2019, through a partnership between the Yale Macmillan Center Program on Refugees, Forced Displacement, and Humanitarian Responses (Yale Macmillan PRFDHR), the Gender & Adolescence: Global Evidence (GAGE) program, the Poverty and Equity Global Practice of the World Bank and the State and Peacebuilding Fund (SPF) administered by the World Bank. It is a representative survey of the post-2017 population of displaced Rohingya and households in host communities in the Cox’s Bazar district in Bangladesh.
The high-frequency phone tracking (HFT) surveys were built to maintain communication with baseline respondents while collecting rapid data on key welfare indicators on labor, basic needs and education. Three rounds of the HFT have been completed between 2020-2021, which have been used to produce welfare updates on the host and Rohingya population residing in Cox's Bazar, Bangladesh, particularly amidst the COVID-19 crisis.
The tracking surveys collected information across three broad welfare dimensions: labor, access to basic needs and education status of school-aged children. Round 1 collected information on labor and access to basic needs only; the module on education was added Round 2 onwards.
Cox's Bazar district and some parts of Bandarban district.
Households and individuals
a) Rohingya population living in camps and b) host population within Cox's Bazar and Bandarban district.
Sample survey data [ssd]
The CBPS study has a total sample size of 5,020 households (HHs), divided among three strata covering Rohingya refugees in camps and host communities in Cox’s Bazar district and some adjacent regions of Bandarban district. The CBPS HFT attempted to follow the full baseline sample of 5,020 household in each round, with no alterations or additions made to the sampling design. The baseline sampling strategy is detailed below.
The three strata are defined as:
i. Rohingya refugees in camps
ii. High exposure hosts: hosts within 15 km (3-hour walking distance) of camps
iii. Low exposure hosts: hosts at more than 15 km (3-hour walking distance) from camps
(In the datasets, the 'settlement_type' and 'stratum' variables identify the different levels at which the sample is representative)
Defining the camp strata: A two-step data collection on Rohingya refugee prevalence within host communities (i.e., outside of camps) confirmed that prevalence in host communities was low, and that this was the case not only for newer Rohingya displaced, but for the older cohort of displaced, as well. This pattern of refugee prevalence supported having one stratum for the Rohingya displaced living in camps. The sampling strategy for the CBPS therefore focused on generating representative estimates for the camp based Rohingya population in Cox’s Bazar district.
Defining the host strata: For hosts, the sampling strategy was designed to account for the differential implications of a camp-based concentration of close to a million Rohingya displaced for different areas of Cox’s Bazar. To distinguish between host communities that are differentially affected by the arrival of the Rohingya, the CBPS sampling strategy used a threshold of three hours’ walking time from a campsite to define two survey strata: (i) host communities with potentially high exposure (HE) to the displaced Rohingya, and (ii) host communities with potentially low exposure (LE).
Sampling frame: The camp sample uses the Needs and Population Monitoring Round 12 (NPM12) data from the International Organization for Migration as the sampling frame. For the host sample, a combination of the 2011 population census, Admin 4 shapefiles from the Bureau of Statistics and publicly available Google Earth imagery and OpenStreetMaps were used to develop a sampling frame.
Stages of sample selection: For camps, NPM12 divided all camps into 1,954 majhee blocks.1 200 blocks were randomly selected using a probability proportional to the size of the camp. A full listing was carried out in each selected camp block.
For hosts, a two-stage sampling strategy was followed. The first stage of selection was done at the mauza level by strata. A random sample of 66 mauzas was drawn from a frame of 286 mauzas using probability proportional to size. Based on census population size, each mauza was divided into segments of roughly 100-150 households. The second stage selected three segments from each selected mauza with equal probability of selection.
Listing and replacements: Within each selected PSU in camps (blocks) and hosts (mauza-segments), all households (100-150 on average) were listed. Of listed households, 13 households were selected at random for interview, with an additional replacement list of 5 households. More information on the sampling strategy and process can be found on the published working paper titled “Data Triangulation Strategies to Design a Representative Household Survey of Hosts and Rohingya Displaced in Cox’s Bazar, Bangladesh”.
While the original sampling strategy was designed to be representative of all camp-based Rohingya displaced, campsites with older Rohingya displaced refused to participate in the listing due to other political sensitivities. This refusal was maintained despite many attempts. Since the older Rohingya displaced were not a separate stratum, a decision was made to drop these households from the survey. Therefore, the attained sample does not contain registered refugees from the two camps – Kutupalong RC and Nayapara RC.
The host sample covers six out of eight upazilas in Cox’s Bazar District (Chakaria, Cox’s Bazar Sadar, Pekua, Ramu, Teknaf, and Ukhia upazilas) and one upazila in Bandarban District (Naikhongchhori upazila). The two upazilas not covered within the sample are the islands of Kutubdia and Maheshkhali.
Computer Assisted Personal Interview [capi]
The R1 tracking questionnaire was developed as a lean version of the questionnaire implemented during the CBPS baseline. The R2 and R3 questionnaires retained certain aspects of the R1 questionnaire, but also added more detailed questions on aspects such as food security (in consultation with UN-WFP) and credit-seeking and coping behavior based on findings observed in previous rounds and dynamic research needs within the COVID-19 crisis.
One questionnaire was developed per round of data collection with modules containing household level questions on access to basic needs, credit-seeking behavior, access to health services, vaccinations and individual level questions on labor market status. Any adult, knowledgeable member of the confirmed sample household were eligible to answer the household modules. The labor module was only permitted if the respondent reached was any one of the 2-3 selected adults within the household who had completed the baseline adult questionnaires.
Questionnaires were developed in English and translated into Bengali. The translations to Bengali were thoroughly reviewed by the World Bank team’s local consultants to ensure quality. Pretesting and piloting were done using the Bengali questionnaires.
All questionnaires and modules in English are provided as external resources.
Data was collected through computer-assisted telephone interviews via SurveyCTO, an ODK-based platform. Maintenance of correct questionnaire flow was ensured through in-built skips and logic checks within the programmed questionnaire.
No manual data corrections were made on submitted interviews by the data processing team. Interviews flagged as needing field corrections due to mistaken entries were re-submitted by enumerators upon strict evaluation by the project team upon close review of the concerns raised and filtered by the program automatically before closing of data collection in each round.
In addition to logic checks within the survey program itself, extensive data consistency checks and quality indicators were developed by the WB team to monitor data quality during survey implementation. Field debriefs were held frequently during the piloting phase and first week of data collection, and once a week in latter weeks to provide feedback to enumerators and gain clarity on data quality concerns.
Post data collection, structural and consistency checks have been conducted on each round dataset and in-between datasets from different rounds.
The response rates at household level for each round of the CBPS HFT, based on the baseline sample of 5,020 and disaggregated at stratum-level are: Round 1: Overall - 67%; Camps - 54%; High exposure: 71%; Low exposure: 72% Round 2: Overall - 72%; Camps - 63%; High exposure: 81%; Low exposure: 80% Round 3: Overall - 68%; Camps - 55%; High exposure: 81%; Low exposure: 80%
*Note that the Round 1 tracking exercise was a joint-effort between the Yale Y-Rise team and the WB team. The Yale team contacted and surveyed a randomly selected 25% of baseline households, while the WB team completed the remaining 75%. The Round 1 dataset contains data on this segment of the sample only as the welfare surveys implemented by the teams were different.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh BD: Land Area data was reported at 130,170.000 sq km in 2022. This stayed constant from the previous number of 130,170.000 sq km for 2021. Bangladesh BD: Land Area data is updated yearly, averaging 130,170.000 sq km from Dec 1961 (Median) to 2022, with 62 observations. The data reached an all-time high of 130,170.000 sq km in 2022 and a record low of 130,170.000 sq km in 2022. Bangladesh BD: Land Area data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Bangladesh – Table BD.World Bank.WDI: Environmental: Land Use, Protected Areas and National Wealth. Land area is a country's total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones. In most cases the definition of inland water bodies includes major rivers and lakes.;Food and Agriculture Organization, electronic files and web site.;Sum;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh BD: Rural Land Area data was reported at 79,328.863 sq km in 2015. This records a decrease from the previous number of 85,189.642 sq km for 2000. Bangladesh BD: Rural Land Area data is updated yearly, averaging 85,189.642 sq km from Dec 1990 (Median) to 2015, with 3 observations. The data reached an all-time high of 90,000.474 sq km in 1990 and a record low of 79,328.863 sq km in 2015. Bangladesh BD: Rural Land Area data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Bangladesh – Table BD.World Bank.WDI: Environmental: Land Use, Protected Areas and National Wealth. Rural land area in square kilometers, derived from urban extent grids which distinguish urban and rural areas based on a combination of population counts (persons), settlement points, and the presence of Nighttime Lights. Areas are defined as urban where contiguous lighted cells from the Nighttime Lights or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons.;Center for International Earth Science Information Network (CIESIN)/Columbia University. 2013. Urban-Rural Population and Land Area Estimates Version 2. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://sedac.ciesin.columbia.edu/data/set/lecz-urban-rural-population-land-area-estimates-v2.;Sum;
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset provides detailed information on road surfaces from OpenStreetMap (OSM) data, distinguishing between paved and unpaved surfaces across the region. This information is based on road surface prediction derived from hybrid deep learning approach. For more information on Methods, refer to the paper
Roughly 0.3072 million km of roads are mapped in OSM in this region. Based on AI-mapped estimates the share of paved and unpaved roads is approximately 0.0337 and 0.0178 (in million kms), corressponding to 10.9845% and 5.798% respectively of the total road length in the dataset region. 0.2557 million km or 83.2175% of road surface information is missing in OSM. In order to fill this gap, Mapillary derived road surface dataset provides an additional 0.0017 million km of information (corressponding to 0.6463% of total missing information on road surface)
It is intended for use in transportation planning, infrastructure analysis, climate emissions and geographic information system (GIS) applications.
This dataset provides comprehensive information on road and urban area features, including location, surface quality, and classification metadata. This dataset includes attributes from OpenStreetMap (OSM) data, AI predictions for road surface, and urban classifications.
AI features:
pred_class: Model-predicted class for the road surface, with values "paved" or "unpaved."
pred_label: Binary label associated with pred_class
(0 = paved, 1 = unpaved).
osm_surface_class: Classification of the surface type from OSM, categorized as "paved" or "unpaved."
combined_surface_osm_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing the OSM surface tag, classified as "paved" or "unpaved."
combined_surface_DL_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing DL prediction pred_label
, classified as "paved" or "unpaved."
n_of_predictions_used: Number of predictions used for the feature length estimation.
predicted_length: Predicted length based on the DL model’s estimations, in meters.
DL_mean_timestamp: Mean timestamp of the predictions used, for comparison.
OSM features may have these attributes(Learn what tags mean here):
name: Name of the feature, if available in OSM.
name:en: Name of the feature in English, if available in OSM.
name:* (in local language): Name of the feature in the local official language, where available.
highway: Road classification based on OSM tags (e.g., residential, motorway, footway).
surface: Description of the surface material of the road (e.g., asphalt, gravel, dirt).
smoothness: Assessment of surface smoothness (e.g., excellent, good, intermediate, bad).
width: Width of the road, where available.
lanes: Number of lanes on the road.
oneway: Indicates if the road is one-way (yes or no).
bridge: Specifies if the feature is a bridge (yes or no).
layer: Indicates the layer of the feature in cases where multiple features are stacked (e.g., bridges, tunnels).
source: Source of the data, indicating the origin or authority of specific attributes.
Urban classification features may have these attributes:
continent: The continent where the data point is located (e.g., Europe, Asia).
country_iso_a2: The ISO Alpha-2 code representing the country (e.g., "US" for the United States).
urban: Binary indicator for urban areas based on the GHSU Urban Layer 2019. (0 = rural, 1 = urban)
urban_area: Name of the urban area or city where the data point is located.
osm_id: Unique identifier assigned by OpenStreetMap (OSM) to each feature.
osm_type: Type of OSM element (e.g., node, way, relation).
The data originates from OpenStreetMap (OSM) and is augmented with model predictions using images downloaded from Mapillary in combination with the GHSU Global Human Settlement Urban Layer 2019 and AFRICAPOLIS2020 urban layer.
This dataset is one of many HeiGIT exports on HDX. See the HeiGIT website for more information.
We are looking forward to hearing about your use-case! Feel free to reach out to us and tell us about your research at communications@heigit.org – we would be happy to amplify your work.