15 datasets found

Large Scale International Boundaries
catalog.data.gov
geodata.state.gov
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (Point of Contact) (2025). Large Scale International Boundaries [Dataset]. https://catalog.data.gov/dataset/large-scale-international-boundaries
Explore at:
Dataset updated
May 23, 2025
Dataset provided by
United States Department of Statehttp://state.gov/
Description
Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the
o
Country Codes
public.opendatasoft.com
data.smartidf.services
+6more
csv, excel, geojson +1
Updated Aug 25, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Country Codes [Dataset]. https://public.opendatasoft.com/explore/dataset/countries-codes/
Explore at:
geojson, json, excel, csvAvailable download formats
Dataset updated
Aug 25, 2015
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Description
Country codes: ISO 2ISO 3UNLANGLABEL (EN, FR, SP)
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Data from: Caravan - A global community dataset for large-sample hydrology
biorxiv.org
explore.openaire.eu
+2more
bin, png, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frederik Kratzert; Frederik Kratzert; Grey Nearing; Grey Nearing; Nans Addor; Nans Addor; Tyler Erickson; Martin Gauch; Martin Gauch; Oren Gilon; Lukas Gudmundsson; Lukas Gudmundsson; Avinatan Hassidim; Daniel Klotz; Daniel Klotz; Sella Nevo; Guy Shalev; Yossi Matias; Tyler Erickson; Oren Gilon; Avinatan Hassidim; Sella Nevo; Guy Shalev; Yossi Matias (2024). Caravan - A global community dataset for large-sample hydrology [Dataset]. http://doi.org/10.5281/zenodo.7540792
Explore at:
png, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7540792
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Frederik Kratzert; Frederik Kratzert; Grey Nearing; Grey Nearing; Nans Addor; Nans Addor; Tyler Erickson; Martin Gauch; Martin Gauch; Oren Gilon; Lukas Gudmundsson; Lukas Gudmundsson; Avinatan Hassidim; Daniel Klotz; Daniel Klotz; Sella Nevo; Guy Shalev; Yossi Matias; Tyler Erickson; Oren Gilon; Avinatan Hassidim; Sella Nevo; Guy Shalev; Yossi Matias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
THIS IS A PRE-RELEASE, WHILE THE CARAVAN IS UNDER REVISION.

Check out the preprint at: https://eartharxiv.org/repository/view/3345/ (accepted for publication at Nature Scientific Data).

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

Channel Log:

23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan
Caravan - A global community dataset for large-sample hydrology (csv...
zenodo.org
application/gzip, zip
Updated May 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frederik Kratzert; Frederik Kratzert; Grey Nearing; Grey Nearing; Nans Addor; Nans Addor; Tyler Erickson; Martin Gauch; Martin Gauch; Oren Gilon; Lukas Gudmundsson; Lukas Gudmundsson; Avinatan Hassidim; Daniel Klotz; Daniel Klotz; Sella Nevo; Guy Shalev; Yossi Matias; Tyler Erickson; Oren Gilon; Avinatan Hassidim; Sella Nevo; Guy Shalev; Yossi Matias (2025). Caravan - A global community dataset for large-sample hydrology (csv version) [Dataset]. http://doi.org/10.5281/zenodo.15530022
Explore at:
zip, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15530022
Dataset updated
May 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Frederik Kratzert; Frederik Kratzert; Grey Nearing; Grey Nearing; Nans Addor; Nans Addor; Tyler Erickson; Martin Gauch; Martin Gauch; Oren Gilon; Lukas Gudmundsson; Lukas Gudmundsson; Avinatan Hassidim; Daniel Klotz; Daniel Klotz; Sella Nevo; Guy Shalev; Yossi Matias; Tyler Erickson; Oren Gilon; Avinatan Hassidim; Sella Nevo; Guy Shalev; Yossi Matias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

IMPORTANT: Due to size limitations for individual repositories, the netCDF version and the CSV version of Caravan (since Version 1.6) are split into two different repositories. You can find the netCDF version at https://zenodo.org/records/14673536

Channel Log:

23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

10 May 2023: Version 1.1 - No data change, just update data description.

17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").

27 May 2025: Version 1.6

Updated the CAMELS-AUS data to source from CAMELS-AUS v2. This means more basins (561 compared to 222) and more recent streamflow data (2022 compared to 2014). Note that the gauge id for four basins changed between the original CAMELS-AUS version and v2. Those gauges are ['camelsaus_224213A', 'camelsaus_224214A', 'camelsaus_227225A', 'camelsaus_403213A'] that all lost their trailing "A". To stay synced with CAMELS-AUS (v2), we also adapted the new naming.

Added VERSION file to the root directory that contains the current version number.

Updated the code to the most recent GitHub snapshot (commit 6eab036).

Due to the 50GB repository limit, we had to split the netCDF version and the CSV version into two separate repositories. The CSV version can be found under https://zenodo.org/records/15530021
The big dataset of ultra-marathon running
kaggle.com
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David (2023). The big dataset of ultra-marathon running [Dataset]. https://www.kaggle.com/datasets/aiaiaidavid/the-big-dataset-of-ultra-marathon-running
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
According to the Wikipedia, an ultramarathon, also called ultra distance or ultra running, is any footrace longer than the traditional marathon length of 42.195 kilometres (26 mi 385 yd). Various distances are raced competitively, from the shortest common ultramarathon of 31 miles (50 km) to over 200 miles (320 km). 50k and 100k are both World Athletics record distances, but some 100 miles (160 km) races are among the oldest and most prestigious events, especially in North America.}

The data in this file is a large collection of ultra-marathon race records registered between 1798 and 2022 (a period of well over two centuries) being therefore a formidable long term sample. All data was obtained from public websites.

Despite the original data being of public domain, the race records, which originally contained the athlete´s names, have been anonymized to comply with data protection laws and to preserve the athlete´s privacy. However, a column Athlete ID has been created with a numerical ID representing each unique runner (so if Antonio Fernández participated in 5 races over different years, then the corresponding race records now hold his unique Athlete ID instead of his name). This way I have preserved valuable information.

The dataset contains 7,461,226 ultra-marathon race records from 1,641,168 unique athletes.

The following columns (with data types) are included:

Year of event (int64)

Event dates (object)

Event name (object)

Event distance/length (object)

Event number of finishers (int64)

Athlete performance (object)

Athlete club (object)

Athlete country (object)

Athlete year of birth (float64)

Athlete gender (object)

Athlete age category (object)

Athlete average speed (object)

Athlete ID (int64)

The Event name column include country location information that can be derived to a new column, and similarly seasonal information can be found in the Event dates column beyond the Year of event (these can be extracted with a bit of processing).

The Event distance/length column describes the type of race, covering the most popular UM race distances and lengths, and some other specific modalities (multi-day, etc.):

Distances: 50km, 100km, 50mi, 100mi

Lengths: 6h, 12h, 24h, 48h, 72h, 6d, 10d

Additionally, there is information of age, gender and speed (in km/h) in other columns.

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

zenodo.org
data.niaid.nih.gov

application/gzip, bin +1

Updated Dec 16, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. http://doi.org/10.5281/zenodo.7148117

Explore at:

application/gzip, bin, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7148117

Dataset updated

Dec 16, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:


Column name	Type	Description
time	date time	UTC time of landing or first GA attempt
icao24	string	Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign	string	Aircraft identifier in air-ground communications
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
has_ga	string	"True" if at least one GA was performed, otherwise "False"
n_approaches	integer	Number of approaches identified for this flight
n_rwy_approached	integer	Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name	Type	Description
time	date time	UTC time of landing or first GA attempt
icao24	string	Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign	string	Aircraft identifier in air-ground communications
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
has_ga	string	"True" if at least one GA was performed, otherwise "False"
n_approaches	integer	Number of approaches identified for this flight
n_rwy_approached	integer	Number of unique runways approached by this flight
registration	string	Aircraft registration
typecode	string	Aircraft ICAO typecode
icaoaircrafttype	string	ICAO aircraft type
wtc	string	ICAO wake turbulence category
glide_slope_angle	float	Angle of the ILS glide slope in degrees
has_intersection	string	Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length	float	Length of the runway in kilometre
airport_country	string	ISO Alpha-3 country code of the airport
airport_region	string	Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country	string	ISO Alpha-3 country code of the operator
operator_region	string	Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts	integer	METAR, surface wind speed in knots
wind_dir_deg	integer	METAR, surface wind direction in degrees
wind_gust_knts	integer	METAR, surface wind gust speed in knots
visibility_m	float	METAR, visibility in m
temperature_deg	integer	METAR, temperature in degrees Celsius
press_sea_level_p	float	METAR, sea level pressure in hPa
press_p	float	METAR, QNH in hPA
weather_intensity	list	METAR, list of present weather codes: qualifier - intensity
weather_precipitation	list	METAR, list of present weather codes: weather phenomena - precipitation
weather_desc	list	METAR, list of present weather codes: qualifier - descriptor
weather_obscuration	list	METAR, list of present weather codes: weather phenomena - obscuration
weather_other	list	METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name	Type	Description
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
n_landings	integer	Total number of landings observed on this runway in 2019
ga_rate	float	Go-around rate, per 1000 landings
glide_slope_angle	float	Angle of the ILS glide slope in degrees
has_intersection	string	Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length	float	Length of the runway in kilometres
airport_country	string	ISO Alpha-3 country code of the airport
airport_region	string	Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime
from tqdm.auto import tqdm
import pandas as pd
from traffic.data import opensky
from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False)
df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC"
start = datetime.datetime(year=2019, month=1, day=4).replace(
  tzinfo=datetime.timezone.utc
)
stop = datetime.datetime(year=2019, month=1, day=5).replace(
  tzinfo=datetime.timezone.utc
)

df_selection = df.query("airport==@airport & has_ga

Dataset of 'Mapping 10-m Industrial Lands across 1000+ Global Large Cities,...
zenodo.org
bin, zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheolhee Yoo; Cheolhee Yoo; Yuhan Zhou; Yuhan Zhou; Qihao Weng; Qihao Weng (2025). Dataset of 'Mapping 10-m Industrial Lands across 1000+ Global Large Cities, 2017-2023' [Dataset]. http://doi.org/10.5281/zenodo.14832219
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14832219
Dataset updated
Feb 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cheolhee Yoo; Cheolhee Yoo; Yuhan Zhou; Yuhan Zhou; Qihao Weng; Qihao Weng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides high-resolution (10 m) industrial land maps for 1,093 global cities from 2017 to 2023.

The dataset includes:

10 m resolution industrial land maps for each year (GeoTIFF, Zip)

Detailed information for each city, including industrial land area per capita indicators (1093_city_information.xlsx)

Validation Package (Validation_Package.zip)

File Naming Convention

GeoTIFF files: Industrial_land_XXX_YYY_YEAR.tif

XXX: Country code

YYY: City ID

YEAR: Year of data

Example: Industrial_land_USA_634_2017.tif represents the industrial land map for Chicago, USA, in 2017.

Each TIF file has a 10 m spatial resolution with the GCS_WGS_1984 spatial projection. The maps include three classes:

Class 1: Industrial land in built-up areas

Class 2: Non-industrial land in built-up areas

Class 0: Non-built-up areas

City Information

A detailed summary of city-specific information, including the annual total industrial land area, is provided in 1093_city_information.xlsx. This file includes:

ID_HDC_G0: Unique city ID (Urban Centre)

CTR_MN_NM: Main country name

CTR_MN_ISO: ISO-3 country codes

UC_NM_MN: Main city (Urban Centre) name

UC_NM_LST: Full list of assigned city (Urban Centre) names

CLUSTER: Assigned cluster number in industrial land modeling

URB_ECOREGION: Assigned urban ecoregion

CLUSTER: Assigned cluster number in industrial land modeling

IND_YEAR: Total industrial land area for each year (in m²).

Validation Package

This package includes validation samples used for industrial land mapping validation.

Validation_shapefile/: Contains the validation shapefiles used for assessing the accuracy of industrial land maps.

Validation of Industrial Land Map Using CO₂ Emissions Data.xlsx: Includes validation results comparing industrial land maps with CO₂ emissions data.

Association between Proposed Industrial Land and the Official Data.xlsx: Contains data assessing the relationship between the proposed industrial land and official datasets.

Data from: Login Data Set for Risk-Based Authentication

zenodo.org
data.niaid.nih.gov

zip

Updated Jun 30, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono (2022). Login Data Set for Risk-Based Authentication [Dataset]. http://doi.org/10.5281/zenodo.6782156

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6782156

Dataset updated

Jun 30, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Login Data Set for Risk-Based Authentication

Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.

This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.

The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.

WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.

Overview

The data set contains the following features related to each login attempt on the SSO:

Feature	Data Type	Description	Range or Example
IP Address	String	IP address belonging to the login attempt	0.0.0.0 - 255.255.255.255
Country	String	Country derived from the IP address	US
Region	String	Region derived from the IP address	New York
City	String	City derived from the IP address	Rochester
ASN	Integer	Autonomous system number derived from the IP address	0 - 600000
User Agent String	String	User agent string submitted by the client	Mozilla/5.0 (Windows NT 10.0; Win64; ...
OS Name and Version	String	Operating system name and version derived from the user agent string	Windows 10
Browser Name and Version	String	Browser name and version derived from the user agent string	Chrome 70.0.3538
Device Type	String	Device type derived from the user agent string	(`mobile`, `desktop`, `tablet`, `bot`, `unknown`)¹
User ID	Integer	Idenfication number related to the affected user account	[Random pseudonym]
Login Timestamp	Integer	Timestamp related to the login attempt	[64 Bit timestamp]
Round-Trip Time (RTT) [ms]	Integer	Server-side measured latency between client and server	1 - 8600000
Login Successful	Boolean	`True`: Login was successful, `False`: Login failed	(`true`, `false`)
Is Attack IP	Boolean	IP address was found in known attacker data set	(`true`, `false`)
Is Account Takeover	Boolean	Login attempt was identified as account takeover by incident response team of the online service	(`true`, `false`)

Data Creation

As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.

The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.

The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.

Regarding the Data Values

Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.

You can recognize them by the following values:

ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)

Study Reproduction

Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.

The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.

See RESULTS.md for more details.

Ethics

By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.

The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.

Publication

You can find more details on our conducted study in the following journal article:

Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security

Bibtex

@article{Wiefling_Pump_2022,
 author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
 title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
 journal = {{ACM} {Transactions} on {Privacy} and {Security}},
 doi = {10.1145/3546069},
 publisher = {ACM},
 year  = {2022}
}

License

This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:

Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069

Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎

G
LSIB 2017: Large Scale International Boundary Polygons, Simplified
developers.google.com
Updated Mar 30, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of State, Office of the Geographer (2017). LSIB 2017: Large Scale International Boundary Polygons, Simplified [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/USDOS_LSIB_SIMPLE_2017
Explore at:
Dataset updated
Mar 30, 2017
Dataset provided by
United States Department of State, Office of the Geographer
Time period covered
Mar 30, 2017
Area covered
Earth
Description
The United States Office of the Geographer provides the Large Scale International Boundary (LSIB) dataset. The detailed version (2013) is derived from two other datasets: a LSIB line vector file and the World Vector Shorelines (WVS) from the National Geospatial-Intelligence Agency (NGA). The interior boundaries reflect U.S. government policies on boundaries, boundary disputes, and sovereignty. The exterior boundaries are derived from the WVS; however, the WVS coastline data is outdated and generally shifted from between several hundred meters to over a kilometer. Each feature is the polygonal area enclosed by interior boundaries and exterior coastlines where applicable, and many countries consist of multiple features, one per disjoint region. Compared with the detailed LSIB, in this simplified dataset some disjointed regions of each country have been reduced to a single feature. Furthermore, it excludes medium and smaller islands. The resulting simplified boundary lines are rarely shifted by more than 100 meters from the detailed LSIB lines. Each of the 312 features is a part of the geometry of one of the 284 countries described in this dataset.
Global Air Quality Dataset 🌍🌫️
kaggle.com
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waqar Ali (2024). Global Air Quality Dataset 🌍🌫️ [Dataset]. https://www.kaggle.com/datasets/waqi786/global-air-quality-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Waqar Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Global Air Quality Data dataset provides an extensive compilation of air quality measurements from various prominent cities worldwide. This dataset includes crucial environmental indicators such as particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), and ozone (O3), along with meteorological data like temperature, humidity, and wind speed. With 10,000 records, this dataset is ideal for researchers, data scientists, and policy makers looking to analyze air quality trends, understand the impact of pollution on health, and develop strategies for environmental improvement.

The dataset is composed of the following columns:

City: The name of the city where the air quality measurement was taken. Country: The country in which the city is located. Date: The date when the measurement was recorded. PM2.5: The concentration of fine particulate matter with a diameter of less than 2.5 micrometers (µg/m³). PM10: The concentration of particulate matter with a diameter of less than 10 micrometers (µg/m³). NO2: The concentration of nitrogen dioxide (µg/m³). SO2: The concentration of sulfur dioxide (µg/m³). CO: The concentration of carbon monoxide (mg/m³). O3: The concentration of ozone (µg/m³). Temperature: The temperature at the time of measurement (°C). Humidity: The humidity level at the time of measurement (%). Wind Speed: The wind speed at the time of measurement (m/s).
Student Academic Performance and Probation Dataset
kaggle.com
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rathnakumarw (2024). Student Academic Performance and Probation Dataset [Dataset]. https://www.kaggle.com/datasets/rathnakumarw/student-academic-performance-and-probation-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rathnakumarw
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description This dataset consists of academic and demographic information about 300 students from a university, which can be used for predicting academic outcomes, such as probation status. The dataset was simulated to represent a variety of student attributes across multiple categories like personal data, academic history, and other related information. The primary goal of this dataset is to analyze factors contributing to academic performance and identify students at risk of probation.

Column Descriptions Student No.: (Numeric) A unique identifier for each student. In this dataset, each student has a different ID number, making it a 100% unique column. Cohort: (Numeric) The year a student enrolled in the university. No missing values and consistent across the dataset. College: (Nominal) The name of the college the student belongs to. Examples include "Engineering," "Science," etc. No missing values. College Code: (Nominal) A numerical or alphanumerical code representing the college. This is an alternative representation of the "College" column. Major: (Nominal) The major field of study of the student. Some missing values (23%) represent students who haven’t declared a major or are in an undeclared status. Major Code: (Nominal) A code representing the major subject. Similar to the "Major" column, this has 23% missing values due to undeclared majors. Minor: (Nominal) The minor subject, if any, chosen by the student. This column has a high percentage of missing data (91%) since most students do not have minors. Spec: (Nominal) Specialization within the major field of study. Like the "Minor" column, this has 93% missing data as most students do not declare a specialization. Degree: (Numeric) The type of degree the student is pursuing (e.g., Bachelor's). In this dataset, all students are pursuing the same degree, so there are no missing values. Status: (Nominal) The current academic standing of the student (e.g., "Active," "Inactive"). No missing values. Load Status: (Nominal) The academic load status (e.g., "Full-time," "Part-time"). This column has very few missing values (1%). Gender: (Nominal) The gender of the student (e.g., "Male," "Female"). No missing values. Country: (Nominal) The country of origin of the student. Only 2 missing values, making it nearly complete. Governorate: (Nominal) The administrative region (governorate) the student comes from. This column has a small percentage of missing values (1%). Wellayah: (Nominal) The district or locality within the governorate. Around 1% of the data is missing. CGPA: (Numeric) The cumulative grade point average (CGPA) of the student. This field has 145 missing values, representing students without available CGPA records. Estimated Graduation Year: (Numeric) The expected year in which the student will graduate. No missing values. From HEAC: (Nominal) Indicates whether the student was admitted through the Higher Education Admission Center (HEAC). This column has 4% missing values. Admission Category: (Nominal) The category of admission (e.g., scholarship, self-funded). This column has a significant amount of missing data (98%), indicating that admission category data is either unavailable or irrelevant for most students. Birth Date: (Nominal) The birth date of the student. The dataset includes very few missing values (0%) and has been replaced by the derived feature "Age." Actual Graduation Date: (Nominal) The actual date on which a student graduates. More than half of the values are missing (54%), representing students who haven’t graduated yet. Withdrawal: (Nominal) Indicates whether the student has withdrawn from the university. This column has 89% missing data since the majority of students haven’t withdrawn. Marital Status: (Nominal) The marital status of the student (e.g., "Single," "Married"). No missing values. SQU Hostel: (Nominal) Indicates whether the student lives in the university hostel. No missing values. Percentage (Secondary School Score): (Nominal) The student’s percentage score from secondary school. No missing values. Probation Student: (Nominal) Indicates whether the student is under academic probation. This is the target variable for classification, with no missing values.

Record Details Total Records: 300 Total Attributes: 26 Missing Values: Some columns have a significant proportion of missing data (e.g., Minor, Spec, Major Code), while others have very few or no missing values (e.g., Gender, Cohort, College). Missing values were handled using a placeholder for clarity in certain columns.
Construction Data | Building Materials & Construction Industry Leaders in...
datarade.ai
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai, Construction Data | Building Materials & Construction Industry Leaders in Europe | Verified Global Profiles from 700M+ Dataset | Best Price Guarantee [Dataset]. https://datarade.ai/data-products/construction-data-building-materials-construction-industr-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset provided by
Area covered
Monaco, Serbia, Slovakia, Gibraltar, Moldova (Republic of), Croatia, Estonia, Bosnia and Herzegovina, Åland Islands, Malta
Description
Success.ai’s Construction Data for Building Materials & Construction Industry Leaders in Europe provides a reliable dataset tailored for businesses seeking to connect with leaders in the European construction and building materials sectors. Covering contractors, suppliers, architects, and project managers, this dataset offers verified profiles, firmographic insights, and decision-maker contacts.

With access to over 700 million verified global profiles and data from 70 million businesses, Success.ai ensures that your outreach, market analysis, and strategic partnerships are powered by accurate, continuously updated, and AI-validated information. Backed by our Best Price Guarantee, this solution empowers you to engage effectively with the construction industry across Europe.

Why Choose Success.ai’s Construction Data?

Verified Contact Data for Industry Leaders

Access verified work emails, phone numbers, and LinkedIn profiles of construction executives, project managers, and building material suppliers.

AI-driven validation ensures 99% accuracy, optimizing outreach and minimizing inefficiencies in communication.

Comprehensive Coverage Across Europe’s Construction Sector

Includes profiles from major construction hubs such as Germany, France, the UK, Italy, and Spain, covering a diverse range of projects and organizations.

Gain insights into regional construction trends, material sourcing strategies, and large-scale project developments.

Continuously Updated Datasets

Real-time updates reflect changes in leadership, market expansions, material innovations, and project announcements.

Stay ahead of market trends to align your strategies with evolving industry needs.

Ethical and Compliant

Adheres to GDPR, CCPA, and other global privacy regulations, ensuring responsible use of data and compliance with legal standards.

Data Highlights:

700M+ Verified Global Profiles: Engage with decision-makers, contractors, architects, and engineers in Europe’s construction sector.

70M Business Profiles: Access firmographic data, including company sizes, revenue ranges, and geographic locations.

Decision-Maker Contacts: Connect directly with CEOs, procurement officers, and project leads driving construction projects and material procurement.

Industry Insights: Gain visibility into supply chain networks, sustainable building initiatives, and innovative construction techniques.

Key Features of the Dataset:

Leadership Profiles in Construction

Identify and connect with leaders responsible for major construction projects, material sourcing, and architectural planning.

Target professionals making decisions on vendor selection, project timelines, and compliance.

Advanced Filters for Precision Campaigns

Filter companies by industry segment (commercial construction, residential, infrastructure), geographic location, or revenue size.

Tailor campaigns to align with regional construction challenges, such as sustainability, cost management, or urbanization.

Firmographic Insights and Project Data

Access data on company structures, project scopes, and market positioning to refine your targeting strategy.

Use these insights to identify high-value prospects and uncover new business opportunities.

AI-Driven Enrichment

Profiles enriched with actionable data enable personalized messaging, highlight unique value propositions, and improve engagement outcomes with construction stakeholders.

Strategic Use Cases:

Sales and Vendor Development

Offer construction materials, tools, or technology solutions to procurement teams and project managers in the construction industry.

Build relationships with vendors and contractors seeking innovative solutions to streamline operations and reduce costs.

Market Research and Competitive Analysis

Analyze trends in material usage, construction technologies, and sustainable practices to guide product development and marketing strategies.

Benchmark against competitors to identify market gaps, emerging needs, and high-growth opportunities.

Partnership Development and Supply Chain Optimization

Engage with companies seeking partnerships for large-scale projects, material sourcing, or technology integration.

Foster alliances that drive efficiency, quality, and innovation in construction projects.

Recruitment and Workforce Solutions

Target HR professionals and hiring managers recruiting skilled workers, architects, or engineers for ongoing and upcoming projects.

Provide staffing services, training platforms, or workforce optimization tools tailored to the construction sector.

Why Choose Success.ai?

Best Price Guarantee

Access premium-quality construction data at competitive prices, ensuring strong ROI for your marketing, sales, and bu...
World Important Events - Ancient to Modern
kaggle.com
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saket Kumar (2024). World Important Events - Ancient to Modern [Dataset]. http://doi.org/10.34740/kaggle/dsv/7545558
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7545558
Dataset updated
Feb 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
World
Description
This dataset, "World Important Events - Ancient to Modern," spans significant historical milestones from ancient times to the modern era, covering diverse global incidents. It provides a comprehensive timeline of events that have shaped the world, offering insights into wars, cultural shifts, technological advancements, and social movements.

Column Descriptions:

Sl. No: Serial number. Name of Incident: Title of the event. Date, Month, Year: When the event occurred. Country: Where it happened. Type of Event: Nature of the event (e.g., War, Revolution). Place Name: Specific location of the event. Impact: Brief description of the event's impact. Affected Population: Who was impacted. Important Person/Group Responsible: Key figures or groups involved. Outcome: Result of the event (Positive, Negative, Mixed).

Leverage this dataset for data analytics, cleaning, and visualization to uncover insights. Ideal for historical analysis and educational exploration, it offers a rich foundation for research, storytelling, and understanding global events' impact.

Image attribute : Image By freepik
England and Wales Census 2021 - RM010: Country of birth by ethnic group
statistics.ukdataservice.ac.uk
csv, json, xlsx
Updated May 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2023). England and Wales Census 2021 - RM010: Country of birth by ethnic group [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm010-country-of-birth-by-ethnic-group
Explore at:
xlsx, json, csvAvailable download formats
Dataset updated
May 9, 2023
Dataset provided by
Northern Ireland Statistics and Research Agency
Office for National Statisticshttp://www.ons.gov.uk/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Wales, England
Description
This dataset provides Census 2021 estimates that classify usual residents in England and Wales by country of birth and by ethnic group. The estimates are as at Census Day, 21 March 2021.

Area type

Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

Coverage

Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:

country - for example, Wales

region - for example, London

local authority - for example, Cornwall

health area – for example, Clinical Commissioning Group

statistical area - for example, MSOA or LSOA

Country of birth

The country in which a person was born.

For people not born in one of in the four parts of the UK, there was an option to select "elsewhere".

People who selected "elsewhere" were asked to write in the current name for their country of birth.

Ethnic group

The ethnic group that the person completing the census feels they belong to. This could be based on their culture, family background, identity or physical appearance.

Respondents could choose one out of 19 tick-box response categories, including write-in response options.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Department of State (Point of Contact) (2025). Large Scale International Boundaries [Dataset]. https://catalog.data.gov/dataset/large-scale-international-boundaries

Large Scale International Boundaries

Explore at:

31 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 23, 2025

Dataset provided by

United States Department of Statehttp://state.gov/

Description

Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the

Clear search

Close search

Google apps

Main menu

Large Scale International Boundaries

Country Codes

Geonames - All Cities with a population > 1000

Data from: Caravan - A global community dataset for large-sample hydrology

Caravan - A global community dataset for large-sample hydrology (csv...

The big dataset of ultra-marathon running

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

Dataset of 'Mapping 10-m Industrial Lands across 1000+ Global Large Cities,...

File Naming Convention

City Information

Validation Package

Data from: Login Data Set for Risk-Based Authentication

LSIB 2017: Large Scale International Boundary Polygons, Simplified

Global Air Quality Dataset 🌍🌫️

Student Academic Performance and Probation Dataset

Construction Data | Building Materials & Construction Industry Leaders in...

World Important Events - Ancient to Modern

England and Wales Census 2021 - RM010: Country of birth by ethnic group

Large Scale International Boundaries