The data presented in this data release represent observations of postfire debris flows that have been collected from publicly available datasets. Data originate from 13 different countries: the United States, Australia, China, Italy, Greece, Portugal, Spain, the United Kingdom, Austria, Switzerland, Canada, South Korea, and Japan. The data are located in the file called “PFDF_database_sortedbyReference.txt” and a description of each column header can be found in both the file “column_headers.txt” and the metadata file (“Post-fire Debris-Flow Database (Literature Derived).xml”). The observations are derived from areas that have been burned by wildfire and are global in nature. However, this dataset is synthesized from information collected by many different researchers for different purposes, and therefore not all fields are available for each of the observations. Missing information is indicated by the value “-9999” in the ”PFDF_database_sortedbyReference.txt” file. Note that the text file contains special characters and a mix of date-time formats that reflect the original data provided by the authors. The text may not be displayed correctly if it is opened by proprietary software such as Microsoft Excel but will appear correctly when opened in a text editor software.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data from the top-ranked 250 Korean Dramas as per the MyDramaList website. The data has been collected and uploaded in the form of a CSV file and can be used to work on various Data Science Projects.
The CSV file has 17 columns and 251 rows containing mostly textual data.
Most of the data were collected from the MyDramaList website (https://mydramalist.com), and the data for the names of Production Companies was collected from Wikipedia (https://www.wikipedia.org). I wasn't sure how to scrape the data at the time, and hence I went all manual; copying and pasting the data using the cursor. (Yes it was very tedious to manually copy and paste the data!)
I was working on a Content-based Recommender System for Korean Dramas and I needed data to work with. The datasets available on Kaggle had up to only 100 k-drama titles. Not only that, but quite a few of the features deemed essential were also missing; Synopsis, Tags, Director's name, Cast names, Production Companies' names, and such data weren't available with the pre-existing datasets.
Original Data Source: Top 250 Korean Dramas (KDrama) Dataset
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset provides detailed information on road surfaces from OpenStreetMap (OSM) data, distinguishing between paved and unpaved surfaces across the region. This information is based on road surface prediction derived from hybrid deep learning approach. For more information on Methods, refer to the paper
Roughly 0.1183 million km of roads are mapped in OSM in this region. Based on AI-mapped estimates the share of paved and unpaved roads is approximately 0.0015 and 0.0135 (in million kms), corressponding to 1.2687% and 11.4095% respectively of the total road length in the dataset region. 0.1033 million km or 87.3218% of road surface information is missing in OSM. In order to fill this gap, Mapillary derived road surface dataset provides an additional 0.0 million km of information (corressponding to 0.0422% of total missing information on road surface)
It is intended for use in transportation planning, infrastructure analysis, climate emissions and geographic information system (GIS) applications.
This dataset provides comprehensive information on road and urban area features, including location, surface quality, and classification metadata. This dataset includes attributes from OpenStreetMap (OSM) data, AI predictions for road surface, and urban classifications.
AI features:
pred_class: Model-predicted class for the road surface, with values "paved" or "unpaved."
pred_label: Binary label associated with pred_class
(0 = paved, 1 = unpaved).
osm_surface_class: Classification of the surface type from OSM, categorized as "paved" or "unpaved."
combined_surface_osm_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing the OSM surface tag, classified as "paved" or "unpaved."
combined_surface_DL_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing DL prediction pred_label
, classified as "paved" or "unpaved."
n_of_predictions_used: Number of predictions used for the feature length estimation.
predicted_length: Predicted length based on the DL model’s estimations, in meters.
DL_mean_timestamp: Mean timestamp of the predictions used, for comparison.
OSM features may have these attributes(Learn what tags mean here):
name: Name of the feature, if available in OSM.
name:en: Name of the feature in English, if available in OSM.
name:* (in local language): Name of the feature in the local official language, where available.
highway: Road classification based on OSM tags (e.g., residential, motorway, footway).
surface: Description of the surface material of the road (e.g., asphalt, gravel, dirt).
smoothness: Assessment of surface smoothness (e.g., excellent, good, intermediate, bad).
width: Width of the road, where available.
lanes: Number of lanes on the road.
oneway: Indicates if the road is one-way (yes or no).
bridge: Specifies if the feature is a bridge (yes or no).
layer: Indicates the layer of the feature in cases where multiple features are stacked (e.g., bridges, tunnels).
source: Source of the data, indicating the origin or authority of specific attributes.
Urban classification features may have these attributes:
continent: The continent where the data point is located (e.g., Europe, Asia).
country_iso_a2: The ISO Alpha-2 code representing the country (e.g., "US" for the United States).
urban: Binary indicator for urban areas based on the GHSU Urban Layer 2019. (0 = rural, 1 = urban)
urban_area: Name of the urban area or city where the data point is located.
osm_id: Unique identifier assigned by OpenStreetMap (OSM) to each feature.
osm_type: Type of OSM element (e.g., node, way, relation).
The data originates from OpenStreetMap (OSM) and is augmented with model predictions using images downloaded from Mapillary in combination with the GHSU Global Human Settlement Urban Layer 2019 and AFRICAPOLIS2020 urban layer.
This dataset is one of many HeiGIT exports on HDX. See the HeiGIT website for more information.
We are looking forward to hearing about your use-case! Feel free to reach out to us and tell us about your research at communications@heigit.org – we would be happy to amplify your work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USD/KRW exchange rate fell to 1,362.7300 on July 4, 2025, down 0.06% from the previous session. Over the past month, the South Korean Won has weakened 0.53%, but it's up by 1.08% over the last 12 months. South Korean Won - values, historical data, forecasts and news - updated on July of 2025.
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommended citation
Gütschow, J.; Günther, A.; Pflüger, M. (2021): The PRIMAP-hist national historical emissions time series v2.3.1 (1850-2019). zenodo. doi:10.5281/zenodo.5494497.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Content
Abstract
The PRIMAP-hist dataset combines several published datasets to create a comprehensive set of greenhouse gas emission pathways for every country and Kyoto gas, covering the years 1750 to 2019, and all UNFCCC (United Nations Framework Convention on Climate Change) member states as well as most non-UNFCCC territories. The data resolves the main IPCC (Intergovernmental Panel on Climate Change) 2006 categories. For CO2, CH4, and N2O subsector data for Energy, Industrial Processes and Product Use (IPPU), and Agriculture are available. Due to data availability and methodological issues, version 2.3.1 of the PRIMAP-hist dataset does not include emissions from Land Use, Land-Use Change, and Forestry (LULUCF) in the main file. LULUCF data are included in the file with increased number of significant digits and have to be used with care.
The PRIMAP-hist v2.3.1 dataset is an updated version of
Gütschow, J.; Günther, A.; Pflüger, M. (2021): The PRIMAP-hist national historical emissions time series v2.3 (1750-2019). zenodo. doi:10.5281/zenodo.5175154
The Changelog indicates the most important changes. You can also check the issue tracker on github.com/JGuetschow/PRIMAP-hist for additional information on issues found after the release of the dataset.
Use of the dataset and full description
Before using the dataset, please read this document and the article describing the methodology, especially the section on uncertainties and the section on limitations of the method and use of the dataset.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Please notify us (johannes.guetschow@pik-potsdam.de) if you use the dataset so that we can keep track of how it is used and take that into consideration when updating and improving the dataset.
When using this dataset or one of its updates, please cite the DOI of the precise version of the dataset used and also the data description article which this dataset is supplement to (see above). Please consider also citing the relevant original sources when using the PRIMAP-hist dataset. See the full citations in the References section further below.
Since version 2.3 we use the data formats developed for the PRIMAP2 climate policy analysis suite: PRIMAP2 on GitHub. The data is published both in the interchange format which consists of a csv file with the data and a yaml file with additional metadata and the native NetCDF based format. For a detailed description of the data format we refer to the PRIMAP2 documentation.
We have also, for the first, time included files with more than three significant digits. This file is mainly aimed at people doing policy analysis using the country reported data scenario (HISTCR). Using the high precision data they can avoid questions on discrepancies with the reported data. The uncertainties of emissions data do not justify the additional significant digits and they might give a false sense of accuracy, so please use this version of the dataset with extra care.
Support
If you encounter possible errors or other things that should be noted, please check our issue tracker at github.com/JGuetschow/PRIMAP-hist and report your findings there. Please use the tag “v2.3.1” in any issue you create regarding this dataset.
If you need support in using the dataset or have any other questions regarding the dataset, please contact johannes.guetschow@pik-potsdam.de.
Sources
Files included in the dataset
For each dataset we have three files: the .nc file contains the data and metadata in the native PRIMAP2 netCDF based format. The .csv file contains the data in a csv format following the specifications of the PRIMAP2 interchange format. The metadata for the interchange format file is included in the .yaml file.
Notes
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Translated into Korean with DeepL
All Texts are translated with DeepL. (Machine Translated.)
Issue: some data items are missing, cause of DeepL plan and processing method. I use very cheap plan and all datas are merged into single file and splitted by few code and hand. This is sample/test processing of data set creation with DeepL.
Original Dataset: totally-not-an-llm/EverythingLM-data-V2
EverythingLM V2 Dataset
EverythingLM V2 is a diverse instruct dataset… See the full description on the dataset page: https://huggingface.co/datasets/ziozzang/EverythingLM-data-V2-Ko.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The data presented in this data release represent observations of postfire debris flows that have been collected from publicly available datasets. Data originate from 13 different countries: the United States, Australia, China, Italy, Greece, Portugal, Spain, the United Kingdom, Austria, Switzerland, Canada, South Korea, and Japan. The data are located in the file called “PFDF_database_sortedbyReference.txt” and a description of each column header can be found in both the file “column_headers.txt” and the metadata file (“Post-fire Debris-Flow Database (Literature Derived).xml”). The observations are derived from areas that have been burned by wildfire and are global in nature. However, this dataset is synthesized from information collected by many different researchers for different purposes, and therefore not all fields are available for each of the observations. Missing information is indicated by the value “-9999” in the ”PFDF_database_sortedbyReference.txt” file. Note that the text file contains special characters and a mix of date-time formats that reflect the original data provided by the authors. The text may not be displayed correctly if it is opened by proprietary software such as Microsoft Excel but will appear correctly when opened in a text editor software.