Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Country codes: ISO 2ISO 3UNLANGLABEL (EN, FR, SP)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
THIS IS A PRE-RELEASE, WHILE THE CARAVAN IS UNDER REVISION.
Check out the preprint at: https://eartharxiv.org/repository/view/3345/ (accepted for publication at Nature Scientific Data).
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
Channel Log:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.
All current development and additional community extensions can be found at https://github.com/kratzert/Caravan
IMPORTANT: Due to size limitations for individual repositories, the netCDF version and the CSV version of Caravan (since Version 1.6) are split into two different repositories. You can find the netCDF version at https://zenodo.org/records/14673536
Channel Log:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the Wikipedia, an ultramarathon, also called ultra distance or ultra running, is any footrace longer than the traditional marathon length of 42.195 kilometres (26 mi 385 yd). Various distances are raced competitively, from the shortest common ultramarathon of 31 miles (50 km) to over 200 miles (320 km). 50k and 100k are both World Athletics record distances, but some 100 miles (160 km) races are among the oldest and most prestigious events, especially in North America.}
The data in this file is a large collection of ultra-marathon race records registered between 1798 and 2022 (a period of well over two centuries) being therefore a formidable long term sample. All data was obtained from public websites.
Despite the original data being of public domain, the race records, which originally contained the athlete´s names, have been anonymized to comply with data protection laws and to preserve the athlete´s privacy. However, a column Athlete ID has been created with a numerical ID representing each unique runner (so if Antonio Fernández participated in 5 races over different years, then the corresponding race records now hold his unique Athlete ID instead of his name). This way I have preserved valuable information.
The dataset contains 7,461,226 ultra-marathon race records from 1,641,168 unique athletes.
The following columns (with data types) are included:
The Event name column include country location information that can be derived to a new column, and similarly seasonal information can be found in the Event dates column beyond the Year of event (these can be extracted with a bit of processing).
The Event distance/length column describes the type of race, covering the most popular UM race distances and lengths, and some other specific modalities (multi-day, etc.):
Additionally, there is information of age, gender and speed (in km/h) in other columns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.
If you use this data for a scientific publication, please consider citing our paper.
The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:
go_arounds_minimal.csv.gz
Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
time | date time | UTC time of landing or first GA attempt |
icao24 | string | Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned |
callsign | string | Aircraft identifier in air-ground communications |
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
has_ga | string | "True" if at least one GA was performed, otherwise "False" |
n_approaches | integer | Number of approaches identified for this flight |
n_rwy_approached | integer | Number of unique runways approached by this flight |
The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.
go_arounds_augmented.csv.gz
Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
time | date time | UTC time of landing or first GA attempt |
icao24 | string | Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned |
callsign | string | Aircraft identifier in air-ground communications |
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
has_ga | string | "True" if at least one GA was performed, otherwise "False" |
n_approaches | integer | Number of approaches identified for this flight |
n_rwy_approached | integer | Number of unique runways approached by this flight |
registration | string | Aircraft registration |
typecode | string | Aircraft ICAO typecode |
icaoaircrafttype | string | ICAO aircraft type |
wtc | string | ICAO wake turbulence category |
glide_slope_angle | float | Angle of the ILS glide slope in degrees |
has_intersection |
string | Boolean that is true if the runway has an other runway intersecting it, otherwise false |
rwy_length | float | Length of the runway in kilometre |
airport_country | string | ISO Alpha-3 country code of the airport |
airport_region | string | Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) |
operator_country | string | ISO Alpha-3 country code of the operator |
operator_region | string | Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) |
wind_speed_knts | integer | METAR, surface wind speed in knots |
wind_dir_deg | integer | METAR, surface wind direction in degrees |
wind_gust_knts | integer | METAR, surface wind gust speed in knots |
visibility_m | float | METAR, visibility in m |
temperature_deg | integer | METAR, temperature in degrees Celsius |
press_sea_level_p | float | METAR, sea level pressure in hPa |
press_p | float | METAR, QNH in hPA |
weather_intensity | list | METAR, list of present weather codes: qualifier - intensity |
weather_precipitation | list | METAR, list of present weather codes: weather phenomena - precipitation |
weather_desc | list | METAR, list of present weather codes: qualifier - descriptor |
weather_obscuration | list | METAR, list of present weather codes: weather phenomena - obscuration |
weather_other | list | METAR, list of present weather codes: weather phenomena - other |
This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.
go_arounds_agg.csv.gz
Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:
Column name | Type | Description |
---|---|---|
airport | string | ICAO airport code where the aircraft is landing |
runway | string | Runway designator on which the aircraft landed |
n_landings | integer | Total number of landings observed on this runway in 2019 |
ga_rate | float | Go-around rate, per 1000 landings |
glide_slope_angle | float | Angle of the ILS glide slope in degrees |
has_intersection | string | Boolean that is true if the runway has an other runway intersecting it, otherwise false |
rwy_length | float | Length of the runway in kilometres |
airport_country | string | ISO Alpha-3 country code of the airport |
airport_region | string | Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) |
This aggregated data set is used in the paper for the generalized linear regression model.
Downloading the trajectories
Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:
import datetime
from tqdm.auto import tqdm
import pandas as pd
from traffic.data import opensky
from traffic.core import Traffic
load minimum data set
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False)
df["time"] = pd.to_datetime(df["time"])
select London City Airport, go-arounds, and 2019-01-04
airport = "EGLC"
start = datetime.datetime(year=2019, month=1, day=4).replace(
tzinfo=datetime.timezone.utc
)
stop = datetime.datetime(year=2019, month=1, day=5).replace(
tzinfo=datetime.timezone.utc
)
df_selection = df.query("airport==@airport & has_ga
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides high-resolution (10 m) industrial land maps for 1,093 global cities from 2017 to 2023.
The dataset includes:
Industrial_land_XXX_YYY_YEAR.tif
Industrial_land_USA_634_2017.tif
represents the industrial land map for Chicago, USA, in 2017.Each TIF file has a 10 m spatial resolution with the GCS_WGS_1984
spatial projection. The maps include three classes:
A detailed summary of city-specific information, including the annual total industrial land area, is provided in 1093_city_information.xlsx
. This file includes:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.
WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.
Overview
The data set contains the following features related to each login attempt on the SSO:
Feature | Data Type | Description | Range or Example |
---|---|---|---|
IP Address | String | IP address belonging to the login attempt | 0.0.0.0 - 255.255.255.255 |
Country | String | Country derived from the IP address | US |
Region | String | Region derived from the IP address | New York |
City | String | City derived from the IP address | Rochester |
ASN | Integer | Autonomous system number derived from the IP address | 0 - 600000 |
User Agent String | String | User agent string submitted by the client | Mozilla/5.0 (Windows NT 10.0; Win64; ... |
OS Name and Version | String | Operating system name and version derived from the user agent string | Windows 10 |
Browser Name and Version | String | Browser name and version derived from the user agent string | Chrome 70.0.3538 |
Device Type | String | Device type derived from the user agent string | (mobile , desktop , tablet , bot , unknown )1 |
User ID | Integer | Idenfication number related to the affected user account | [Random pseudonym] |
Login Timestamp | Integer | Timestamp related to the login attempt | [64 Bit timestamp] |
Round-Trip Time (RTT) [ms] | Integer | Server-side measured latency between client and server | 1 - 8600000 |
Login Successful | Boolean | True : Login was successful, False : Login failed | (true , false ) |
Is Attack IP | Boolean | IP address was found in known attacker data set | (true , false ) |
Is Account Takeover | Boolean | Login attempt was identified as account takeover by incident response team of the online service | (true , false ) |
Data Creation
As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.
The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.
The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.
Regarding the Data Values
Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.
You can recognize them by the following values:
ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)
Study Reproduction
Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.
The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.
See RESULTS.md for more details.
Ethics
By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.
The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.
Publication
You can find more details on our conducted study in the following journal article:
Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security
Bibtex
@article{Wiefling_Pump_2022, author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi}, title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}}, journal = {{ACM} {Transactions} on {Privacy} and {Security}}, doi = {10.1145/3546069}, publisher = {ACM}, year = {2022} }
License
This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069
Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎
The United States Office of the Geographer provides the Large Scale International Boundary (LSIB) dataset. The detailed version (2013) is derived from two other datasets: a LSIB line vector file and the World Vector Shorelines (WVS) from the National Geospatial-Intelligence Agency (NGA). The interior boundaries reflect U.S. government policies on boundaries, boundary disputes, and sovereignty. The exterior boundaries are derived from the WVS; however, the WVS coastline data is outdated and generally shifted from between several hundred meters to over a kilometer. Each feature is the polygonal area enclosed by interior boundaries and exterior coastlines where applicable, and many countries consist of multiple features, one per disjoint region. Compared with the detailed LSIB, in this simplified dataset some disjointed regions of each country have been reduced to a single feature. Furthermore, it excludes medium and smaller islands. The resulting simplified boundary lines are rarely shifted by more than 100 meters from the detailed LSIB lines. Each of the 312 features is a part of the geometry of one of the 284 countries described in this dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Global Air Quality Data dataset provides an extensive compilation of air quality measurements from various prominent cities worldwide. This dataset includes crucial environmental indicators such as particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), and ozone (O3), along with meteorological data like temperature, humidity, and wind speed. With 10,000 records, this dataset is ideal for researchers, data scientists, and policy makers looking to analyze air quality trends, understand the impact of pollution on health, and develop strategies for environmental improvement.
The dataset is composed of the following columns:
City: The name of the city where the air quality measurement was taken. Country: The country in which the city is located. Date: The date when the measurement was recorded. PM2.5: The concentration of fine particulate matter with a diameter of less than 2.5 micrometers (µg/m³). PM10: The concentration of particulate matter with a diameter of less than 10 micrometers (µg/m³). NO2: The concentration of nitrogen dioxide (µg/m³). SO2: The concentration of sulfur dioxide (µg/m³). CO: The concentration of carbon monoxide (mg/m³). O3: The concentration of ozone (µg/m³). Temperature: The temperature at the time of measurement (°C). Humidity: The humidity level at the time of measurement (%). Wind Speed: The wind speed at the time of measurement (m/s).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description This dataset consists of academic and demographic information about 300 students from a university, which can be used for predicting academic outcomes, such as probation status. The dataset was simulated to represent a variety of student attributes across multiple categories like personal data, academic history, and other related information. The primary goal of this dataset is to analyze factors contributing to academic performance and identify students at risk of probation.
Column Descriptions Student No.: (Numeric) A unique identifier for each student. In this dataset, each student has a different ID number, making it a 100% unique column. Cohort: (Numeric) The year a student enrolled in the university. No missing values and consistent across the dataset. College: (Nominal) The name of the college the student belongs to. Examples include "Engineering," "Science," etc. No missing values. College Code: (Nominal) A numerical or alphanumerical code representing the college. This is an alternative representation of the "College" column. Major: (Nominal) The major field of study of the student. Some missing values (23%) represent students who haven’t declared a major or are in an undeclared status. Major Code: (Nominal) A code representing the major subject. Similar to the "Major" column, this has 23% missing values due to undeclared majors. Minor: (Nominal) The minor subject, if any, chosen by the student. This column has a high percentage of missing data (91%) since most students do not have minors. Spec: (Nominal) Specialization within the major field of study. Like the "Minor" column, this has 93% missing data as most students do not declare a specialization. Degree: (Numeric) The type of degree the student is pursuing (e.g., Bachelor's). In this dataset, all students are pursuing the same degree, so there are no missing values. Status: (Nominal) The current academic standing of the student (e.g., "Active," "Inactive"). No missing values. Load Status: (Nominal) The academic load status (e.g., "Full-time," "Part-time"). This column has very few missing values (1%). Gender: (Nominal) The gender of the student (e.g., "Male," "Female"). No missing values. Country: (Nominal) The country of origin of the student. Only 2 missing values, making it nearly complete. Governorate: (Nominal) The administrative region (governorate) the student comes from. This column has a small percentage of missing values (1%). Wellayah: (Nominal) The district or locality within the governorate. Around 1% of the data is missing. CGPA: (Numeric) The cumulative grade point average (CGPA) of the student. This field has 145 missing values, representing students without available CGPA records. Estimated Graduation Year: (Numeric) The expected year in which the student will graduate. No missing values. From HEAC: (Nominal) Indicates whether the student was admitted through the Higher Education Admission Center (HEAC). This column has 4% missing values. Admission Category: (Nominal) The category of admission (e.g., scholarship, self-funded). This column has a significant amount of missing data (98%), indicating that admission category data is either unavailable or irrelevant for most students. Birth Date: (Nominal) The birth date of the student. The dataset includes very few missing values (0%) and has been replaced by the derived feature "Age." Actual Graduation Date: (Nominal) The actual date on which a student graduates. More than half of the values are missing (54%), representing students who haven’t graduated yet. Withdrawal: (Nominal) Indicates whether the student has withdrawn from the university. This column has 89% missing data since the majority of students haven’t withdrawn. Marital Status: (Nominal) The marital status of the student (e.g., "Single," "Married"). No missing values. SQU Hostel: (Nominal) Indicates whether the student lives in the university hostel. No missing values. Percentage (Secondary School Score): (Nominal) The student’s percentage score from secondary school. No missing values. Probation Student: (Nominal) Indicates whether the student is under academic probation. This is the target variable for classification, with no missing values.
Record Details Total Records: 300 Total Attributes: 26 Missing Values: Some columns have a significant proportion of missing data (e.g., Minor, Spec, Major Code), while others have very few or no missing values (e.g., Gender, Cohort, College). Missing values were handled using a placeholder for clarity in certain columns.
Success.ai’s Construction Data for Building Materials & Construction Industry Leaders in Europe provides a reliable dataset tailored for businesses seeking to connect with leaders in the European construction and building materials sectors. Covering contractors, suppliers, architects, and project managers, this dataset offers verified profiles, firmographic insights, and decision-maker contacts.
With access to over 700 million verified global profiles and data from 70 million businesses, Success.ai ensures that your outreach, market analysis, and strategic partnerships are powered by accurate, continuously updated, and AI-validated information. Backed by our Best Price Guarantee, this solution empowers you to engage effectively with the construction industry across Europe.
Why Choose Success.ai’s Construction Data?
Verified Contact Data for Industry Leaders
Comprehensive Coverage Across Europe’s Construction Sector
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Leadership Profiles in Construction
Advanced Filters for Precision Campaigns
Firmographic Insights and Project Data
AI-Driven Enrichment
Strategic Use Cases:
Sales and Vendor Development
Market Research and Competitive Analysis
Partnership Development and Supply Chain Optimization
Recruitment and Workforce Solutions
Why Choose Success.ai?
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset, "World Important Events - Ancient to Modern," spans significant historical milestones from ancient times to the modern era, covering diverse global incidents. It provides a comprehensive timeline of events that have shaped the world, offering insights into wars, cultural shifts, technological advancements, and social movements.
Column Descriptions:
Sl. No: Serial number. Name of Incident: Title of the event. Date, Month, Year: When the event occurred. Country: Where it happened. Type of Event: Nature of the event (e.g., War, Revolution). Place Name: Specific location of the event. Impact: Brief description of the event's impact. Affected Population: Who was impacted. Important Person/Group Responsible: Key figures or groups involved. Outcome: Result of the event (Positive, Negative, Mixed).
Leverage this dataset for data analytics, cleaning, and visualization to uncover insights. Ideal for historical analysis and educational exploration, it offers a rich foundation for research, storytelling, and understanding global events' impact.
Image attribute : Image By freepik
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify usual residents in England and Wales by country of birth and by ethnic group. The estimates are as at Census Day, 21 March 2021.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Country of birth
The country in which a person was born.
For people not born in one of in the four parts of the UK, there was an option to select "elsewhere".
People who selected "elsewhere" were asked to write in the current name for their country of birth.
Ethnic group
The ethnic group that the person completing the census feels they belong to. This could be based on their culture, family background, identity or physical appearance.
Respondents could choose one out of 19 tick-box response categories, including write-in response options.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the