This dataset was created by Balal H
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.
This dataset contains 2 Files and 2 Folders
Column 1 contains text data. This text data is preprocessed and balanced, balanced in the sense this data contains an equal number of non-toxic (with toxicity = 0) and toxic (with toxicity >= 0) comments.
Column 2 contains float data. This column stores information about the toxicity of text data.
Column 1 contains text data. In this version of the file, we did implement some more pre-processing techniques like spelling corrections. Also, this dataset is balanced means this data contains an equal number of non-toxic (with toxicity = 0) and toxic (with toxicity >= 0) comments.
Column 2 contains float data. This column stores information about the toxicity of text data.
Column 1 contains text data. In this version of the file, we did implement some more pre-processing techniques like spelling corrections. Also, this dataset is balanced means this data contains an equal number of non-toxic (with toxicity = 0) and toxic (with toxicity >= 0) comments.
Column 2 contains float data. This column stores information about the toxicity of text data.
All the FastText Word embeddings in this dataset were learned using python's gensim library with window size = 4 and sg = 0 implies Continuous bag of words (CBOW) approach to learn word embeddings
In CBOW, the primary task is to build a language model that correctly predicts the center word given the context words in which the center word appears. Consider our example sentence we take the word “jumps” as the center word, then its context is formed by words in its vicinity. If we take the context size of 2, then for our example, the context is given by brown, fox, over, the. CBOW uses the context words to predict the target word—jumps.
If you are interested then you can learn more about FastText from below attached resources:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Binary variables are reported as proportions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and code used in a journal paper entitled Geographically weighted regression based on a network weight matrix: a case study using urbanization driving force data in China , published in the International Journal of Geographical Information Science.
Abstract: Geographically weighted regression (GWR) is a classical modeling method for dealing with spatial non-stationarity. It incorporates the distance decay effect in space to fit local regression models, where distance is defined as Euclidean distance. Although this definition has been expanded, it remains focused on physical distance. However, in the era of globalization and informatization, where the phenomenon of remotely close association is common, physical distance may not reflect real spatial proximity, and GWR based on physical distance has clear limitations. This paper proposes a geographically weighted regression based on a network weight matrix (NWM GWR) model. This does not rely on geographical location modeling; instead, it uses network distance to measure the proximity between two regions and weights observations by improving the kernel function to achieve distance attenuation. We adopt the population mobility network to establish a network weight matrix, modeling China’s urbanization and its multidimensional driving factors using network autocorrelation and NWM GWR methods. Results show that the NWM GWR model has more accurate fit and better stability than ordinary least squares and GWR models, and better reveals relationships between variables, which makes it suitable for modeling economic and social systems more broadly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset split into three files. The data was collected from 527 organic food consumers. So, the files for factor analysis and logistic regression will have 527 respondents data while the file for clustering has only 401 organic food consumers data. Clustering was carried out only for those consuming organic food products.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Democracy Timeseries Data Release 3.0, January 2009
This dataset is in a country-year case format, suitable for time-series analysis. It contains data on the social, economic and political characteristics of 191 nations with over 600 variables from 1971 to 2007. It merges the indicators of democracy by Freedom House, Vanhanen, Polity IV, and Cheibub and Gandhi, plus selected institutional classifications and also socio-economic indicators from the World Bank. New variables including the KOF Globalization Index and the new Norris-Inglehart Cosmopolitan Index. Note that you should check the original codebooks for the meaning and definition of each of the variables. The period for each series also varies. Note that the Excel version is for Office 2007 only. This is the dataset used in the book, Driving Democracy.
January 2009
Stored in Stata, SPSS, Excel and CSV.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data includes all datasets and codes for adversarial validation in geospatial machine learning prediction and corresponding experiments. Except for datasets (Brazil Amazon basion AGB dataset and synthetic species abundance dataset) and code, Reademe.txt explains each file's meaning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format.
The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing.
Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper.
Notes:
* the raw data is provided in this upload, but the processing is not addressed here.
* the authors of this document are a subset of the authors of the related paper.
* this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...
The U.S. Geological Survey (USGS), in cooperation with the Federal Emergency Management Agency, Pennsylvania Department of Environmental Protection, Pennsylvania Department of Transportation, and Susquehanna River Basin Commission, prepared hydro-conditioned geographic information systems (GIS) layers for use in the Pennsylvania StreamStats application. These data were used to update the peak flow and low flow regression equations for Pennsylvania. This dataset consists of stream definition 900 cell threshold rasters for each 8-digit Hydrologic Unit Code (HUC) area in Pennsylvania, one of the layer types needed to delineate watersheds within the HUC-8 areas, merged into a single dataset. The 59 HUCs represented by this dataset are 02040101, 02040102, 02040103, 02040104, 02040105, 02040106, 02040201, 02040202, 02040203, 02040205, 02050101, 02050102, 02050103, 02050104, 02050105, 02050106, 02050107, 02050201, 02050202, 02050203, 02050204, 02050205, 02050206, 02050301, 02050302, 02050303, 02050304, 02050305, 02050306, 02060002, 02060003, 02070002, 02070003, 02070004, 02070009, 04110003, 04120101, 04130002, 05010001, 05010002, 05010003, 05010004, 05010005, 05010006, 05010007, 05010008, 05010009, 05020001, 05020002, 05020003, 05020004, 05020005, 05020006, 05030101, 05030102, 05030103, 05030104, 05030105, and 05030106.
This dataset supports the publication “Statistical simulation of ocean current patterns using autoregressive logistic regression (ALR) models: A case study in the Gulf of Mexico.†(https://doi.org/10.1016/j.ocemod.2019.02.010) This dataset includes historical dynamic topography and surface currents from satellite altimetry (1991-01-01 to 2017-08-29) as well as processed views of the Gulf of Mexico Loop Current. The processed data include a time history of 84 empirical orthogonal functions (EOFs), the results of automated pattern identification (principal component analysis and k-means clustering), and the results of three (3) autoregressive logistic regression (ALR) models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveThis study aims to develop and compare different models to predict the Length of Stay (LoS) and the Prolonged Length of Stay (PLoS) of inpatients admitted through the emergency department (ED) in general patient settings. This aim is not only to promote any specific model but rather to suggest a decision-supporting tool (i.e., a prediction framework).MethodsWe analyzed a dataset of patients admitted through the ED to the “Sant”Orsola Malpighi University Hospital of Bologna, Italy, between January 1 and October 26, 2022. PLoS was defined as any hospitalization with LoS longer than 6 days. We deployed six classification algorithms for predicting PLoS: Random Forest (RF), Support Vector Machines (SVM), Gradient Boosting (GB), AdaBoost, K-Nearest Neighbors (KNN), and logistic regression (LoR). We evaluated the performance of these models with the Brier score, the area under the ROC curve (AUC), accuracy, sensitivity (recall), specificity, precision, and F1-score. We further developed eight regression models for LoS prediction: Linear Regression (LR), including the penalized linear models Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Elastic-net regression, Support vector regression, RF regression, KNN, and eXtreme Gradient Boosting (XGBoost) regression. The model performances were measured by their mean square error, mean absolute error, and mean relative error. The dataset was randomly split into a training set (70%) and a validation set (30%).ResultsA total of 12,858 eligible patients were included in our study, of whom 60.88% had a PloS. The GB classifier best predicted PloS (accuracy 75%, AUC 75.4%, Brier score 0.181), followed by LoR classifier (accuracy 75%, AUC 75.2%, Brier score 0.182). These models also showed to be adequately calibrated. Ridge and XGBoost regressions best predicted LoS, with the smallest total prediction error. The overall prediction error is between 6 and 7 days, meaning there is a 6–7 day mean difference between actual and predicted LoS.ConclusionOur results demonstrate the potential of machine learning-based methods to predict LoS and provide valuable insights into the risks behind prolonged hospitalizations. In addition to physicians' clinical expertise, the results of these models can be utilized as input to make informed decisions, such as predicting hospitalizations and enhancing the overall performance of a public healthcare system.
Logistic regression analysis to demonstrate variables associated with presence of psychiatric disorder in each of the six conditions (blank means that the odds ratio was not significant).
The U.S. Geological Survey (USGS), in cooperation with the Puerto Rico Environmental Quality Board, has compiled a series of geospatial datasets for Puerto Rico to be implemented into the USGS StreamStats application (https://streamstats.usgs.gov/ss/). These geospatial datasets, along with basin characteristics datasets for Puerto Rico published as a separate USGS data release (https://doi.org/10.5066/P9HK9SSQ), were used to delineate watersheds and develop the peak-flow and low-flow regression equations used by StreamStats. The geospatial dataset described herein are the stream definition rasters with a 900 stream cell threshold at a 10-m resolution. The flow accumulation grid is used as input to create this dense stream grid. This requires a flow accumulation of 900 pixels or greater to initiate a stream channel. A value of 1 is assigned for all of the cells equal or greater than the threshold and no data for all other cells. Data are partitioned into four TIFF files, one for each of the four 8-digit Hydrologic Unit Code (HUC) areas for Puerto Rico: 21010002, 21010003, 21010004, and 21010005.
The U.S. Geological Survey (USGS), in cooperation with the Illinois Center for Transportation and the Illinois Department of Transportation, prepared hydro-conditioned geographic information systems (GIS) layers for use in the Illinois StreamStats application. These data were used to delineate drainage basins and compute basin characteristics for updated peak flow and flow duration regression equations for Illinois. This dataset consists of raster grid files for elevation (dem), flow accumulation (fac), flow direction (fdr), and stream definition (str900) for each 8-digit Hydrologic Unit Code (HUC) area in Illinois merged into a single dataset. There are 51 full or partial HUC 8s represented by this data set: 04040002, 05120108, 05120109, 05120111, 05120112, 05120113, 05120114, 05120115, 05140202, 05140203, 05140204, 05140206, 07060005, 07080101, 07080104, 07090001, 07090002, 07090003, 07090004, 07090005, 07090006, 07090007, 07110001, 07110004, 07110009, 07120001, 07120002, 07120004 (0712003 was combined into this HUC), 07120005, 07120006, 07120007, 07130001, 07130002, 07130003, 07130004, 07130005, 07130006, 07130007, 07130008, 07130009, 07130010, 07130011, 07130012, 07140101, 07140105, 07140106, 07140108, 07140201, 07140202, 07140203, and 07140204.
This dataset was created by Balal H