2 datasets found

f
Classification of rare land cover types: Distinguishing annual and perennial...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Bogner; Bumsuk Seo; Dorian Rohner; Björn Reineking (2023). Classification of rare land cover types: Distinguishing annual and perennial crops in an agricultural catchment in South Korea [Dataset]. http://doi.org/10.1371/journal.pone.0190476
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0190476
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Christina Bogner; Bumsuk Seo; Dorian Rohner; Björn Reineking
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Korea
Description
Many environmental data are inherently imbalanced, with some majority land use and land cover types dominating over rare ones. In cultivated ecosystems minority classes are often the target as they might indicate a beginning land use change. Most standard classifiers perform best on a balanced distribution of classes, and fail to detect minority classes. We used the synthetic minority oversampling technique (smote) with Random Forest to classify land cover classes in a small agricultural catchment in South Korea using modis time series. This area faces a major soil erosion problem and policy measures encourage farmers to replace annual by perennial crops to mitigate this issue. Our major goal was therefore to improve the classification performance on annual and perennial crops. We compared four different classification scenarios on original imbalanced and synthetically oversampled balanced data to quantify the effect of smote on classification performance. smote substantially increased the true positive rate of all oversampled minority classes. However, the performance on minor classes remained lower than on the majority class. We attribute this result to a class overlap already present in the original data set that is not resolved by smote. Our results show that resampling algorithms could help to derive more accurate land use and land cover maps from freely available data. These maps can be used to provide information on the distribution of land use classes in heterogeneous agricultural areas and could potentially benefit decision making.
f
Data from: Investigating the contributors to hit-and-run crashes using...
figshare.com
xlsx
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li (2024). Investigating the contributors to hit-and-run crashes using gradient boosting decision trees [Dataset]. http://doi.org/10.6084/m9.figshare.27178305.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27178305.v1
Dataset updated
Oct 7, 2024
Dataset provided by
figshare
Authors
Gen Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper uses the 2021 traffic crash data from the NHTSA CRSS as a sample for model training and validation. The CRSS data collects crash report data provided by police departments from all 50 states in the United States. It details various factors of each traffic crash, including crash information, driver information, vehicle information, road information, and environmental information.The crash accident data provided by CRSS include crash-related details such as the location, time, cause, type of crash, driver’s age, gender, attention level, injury status, risky driving behavior, vehicle type, usage, damage, and hit-and-run situations. However, due to the separate recording of the dataset and the presence of systematic errors and redundant information, the CRSS 2021 data undergo the following merging and filtering processes:1) Match and merge separately recorded data based on the unique case number "CASENUM" in the dataset.2) Records with missing values in critical variables (e.g., whether the crash involved a hit-and-run) were removed to avoid bias in the analysis. For non-critical variables, missing values were imputed using the mean or mode depending on the variable type. For continuous variables, such as speed limits, we used mean imputation. For categorical variables (e.g., weather, road surface conditions), mode imputation was applied.3) Noise in the dataset arises from both human error in crash reporting and random fluctuations in recorded variables. We used z-scores to detect and remove extreme outliers in numerical variables (e.g., speed limits, crash angle). Data points with a z-score beyond ±3 standard deviations were considered outliers and were excluded from the analysis. To handle noisy fluctuations in continuous variables (e.g., speed limits), we applied a symmetrical exponential moving average (EMA) filter.After processing, the CRSS 2021 data include a total of 54,187 crash accidents, among which there are 5,944 hit-and-run accidents, accounting for 10.97% of crash accidents. The hit-and-run and non-hit-and-run categories face a serious class imbalance issue, and data balancing processing is applied to the target variable during parameter calibration. Hit-and-run crashes constitute a relatively small proportion of total crashes in the dataset, leading to class imbalance in the binary classification target. To address this issue, we utilized the resampling techniques available in the data mining software. Specifically, random undersampling was applied to the majority class (non-hit-and-run crashes), while Synthetic Minority Over-sampling Technique (SMOTE) was used for the minority class. This ensured balanced class distribution in the training set, improving model performance and preventing the classifier from being biased toward the majority class.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christina Bogner; Bumsuk Seo; Dorian Rohner; Björn Reineking (2023). Classification of rare land cover types: Distinguishing annual and perennial crops in an agricultural catchment in South Korea [Dataset]. http://doi.org/10.1371/journal.pone.0190476

Classification of rare land cover types: Distinguishing annual and perennial crops in an agricultural catchment in South Korea

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0190476

Dataset updated

Jun 1, 2023

Dataset provided by

PLOS ONE

Authors

Christina Bogner; Bumsuk Seo; Dorian Rohner; Björn Reineking

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

South Korea

Description

Many environmental data are inherently imbalanced, with some majority land use and land cover types dominating over rare ones. In cultivated ecosystems minority classes are often the target as they might indicate a beginning land use change. Most standard classifiers perform best on a balanced distribution of classes, and fail to detect minority classes. We used the synthetic minority oversampling technique (smote) with Random Forest to classify land cover classes in a small agricultural catchment in South Korea using modis time series. This area faces a major soil erosion problem and policy measures encourage farmers to replace annual by perennial crops to mitigate this issue. Our major goal was therefore to improve the classification performance on annual and perennial crops. We compared four different classification scenarios on original imbalanced and synthetically oversampled balanced data to quantify the effect of smote on classification performance. smote substantially increased the true positive rate of all oversampled minority classes. However, the performance on minor classes remained lower than on the majority class. We attribute this result to a class overlap already present in the original data set that is not resolved by smote. Our results show that resampling algorithms could help to derive more accurate land use and land cover maps from freely available data. These maps can be used to provide information on the distribution of land use classes in heterogeneous agricultural areas and could potentially benefit decision making.

Clear search

Close search

Google apps

Main menu

Classification of rare land cover types: Distinguishing annual and perennial...

Data from: Investigating the contributors to hit-and-run crashes using...

Classification of rare land cover types: Distinguishing annual and perennial crops in an agricultural catchment in South Korea