11 datasets found
  1. d

    Data from: Best Management Practices Statistical Estimator (BMPSE) Version...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Best Management Practices Statistical Estimator (BMPSE) Version 1.2.0 [Dataset]. https://catalog.data.gov/dataset/best-management-practices-statistical-estimator-bmpse-version-1-2-0
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The Best Management Practices Statistical Estimator (BMPSE) version 1.2.0 was developed by the U.S. Geological Survey (USGS), in cooperation with the Federal Highway Administration (FHWA) Office of Project Delivery and Environmental Review to provide planning-level information about the performance of structural best management practices for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway and urban runoff on the Nation's receiving waters (Granato 2013, 2014; Granato and others, 2021). The BMPSE was assembled by using a Microsoft Access® database application to facilitate calculation of BMP performance statistics. Granato (2014) developed quantitative methods to estimate values of the trapezoidal-distribution statistics, correlation coefficients, and the minimum irreducible concentration (MIC) from available data. Granato (2014) developed the BMPSE to hold and process data from the International Stormwater Best Management Practices Database (BMPDB, www.bmpdatabase.org). Version 1.0 of the BMPSE contained a subset of the data from the 2012 version of the BMPDB; the current version of the BMPSE (1.2.0) contains a subset of the data from the December 2019 version of the BMPDB. Selected data from the BMPDB were screened for import into the BMPSE in consultation with Jane Clary, the data manager for the BMPDB. Modifications included identifying water quality constituents, making measurement units consistent, identifying paired inflow and outflow values, and converting BMPDB water quality values set as half the detection limit back to the detection limit. Total polycyclic aromatic hydrocarbons (PAH) values were added to the BMPSE from BMPDB data; they were calculated from individual PAH measurements at sites with enough data to calculate totals. The BMPSE tool can sort and rank the data, calculate plotting positions, calculate initial estimates, and calculate potential correlations to facilitate the distribution-fitting process (Granato, 2014). For water-quality ratio analysis the BMPSE generates the input files and the list of filenames for each constituent within the Graphical User Interface (GUI). The BMPSE calculates the Spearman’s rho (ρ) and Kendall’s tau (τ) correlation coefficients with their respective 95-percent confidence limits and the probability that each correlation coefficient value is not significantly different from zero by using standard methods (Granato, 2014). If the 95-percent confidence limit values are of the same sign, then the correlation coefficient is statistically different from zero. For hydrograph extension, the BMPSE calculates ρ and τ between the inflow volume and the hydrograph-extension values (Granato, 2014). For volume reduction, the BMPSE calculates ρ and τ between the inflow volume and the ratio of outflow to inflow volumes (Granato, 2014). For water-quality treatment, the BMPSE calculates ρ and τ between the inflow concentrations and the ratio of outflow to inflow concentrations (Granato, 2014; 2020). The BMPSE also calculates ρ between the inflow and the outflow concentrations when a water-quality treatment analysis is done. The current version (1.2.0) of the BMPSE also has the option to calculate urban-runoff quality statistics from inflows to BMPs by using computer code developed for the Highway Runoff Database (Granato and Cazenas, 2009;Granato, 2019). Granato, G.E., 2013, Stochastic empirical loading and dilution model (SELDM) version 1.0.0: U.S. Geological Survey Techniques and Methods, book 4, chap. C3, 112 p., CD-ROM https://pubs.usgs.gov/tm/04/c03 Granato, G.E., 2014, Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs): U.S. Geological Survey Scientific Investigations Report 2014–5037, 37 p., http://dx.doi.org/10.3133/sir20145037. Granato, G.E., 2019, Highway-Runoff Database (HRDB) Version 1.1.0: U.S. Geological Survey data release, https://doi.org/10.5066/P94VL32J. Granato, G.E., and Cazenas, P.A., 2009, Highway-Runoff Database (HRDB Version 1.0)--A data warehouse and preprocessor for the stochastic empirical loading and dilution model: Washington, D.C., U.S. Department of Transportation, Federal Highway Administration, FHWA-HEP-09-004, 57 p. https://pubs.usgs.gov/sir/2009/5269/disc_content_100a_web/FHWA-HEP-09-004.pdf Granato, G.E., Spaetzel, A.B., and Medalie, L., 2021, Statistical methods for simulating structural stormwater runoff best management practices (BMPs) with the stochastic empirical loading and dilution model (SELDM): U.S. Geological Survey Scientific Investigations Report 2020–5136, 41 p., https://doi.org/10.3133/sir20205136

  2. Data_Sheet_1_Applying Dimensionality Reduction Techniques in Source-Space...

    • frontiersin.figshare.com
    pdf
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitikorn Srisrisawang; Gernot R. Müller-Putz (2023). Data_Sheet_1_Applying Dimensionality Reduction Techniques in Source-Space Electroencephalography via Template and Magnetic Resonance Imaging-Derived Head Models to Continuously Decode Hand Trajectories.PDF [Dataset]. http://doi.org/10.3389/fnhum.2022.830221.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Nitikorn Srisrisawang; Gernot R. Müller-Putz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Several studies showed evidence supporting the possibility of hand trajectory decoding from low-frequency electroencephalography (EEG). However, the decoding in the source space via source localization is scarcely investigated. In this study, we tried to tackle the problem of collinearity due to the higher number of signals in the source space by two folds: first, we selected signals in predefined regions of interest (ROIs); second, we applied dimensionality reduction techniques to each ROI. The dimensionality reduction techniques were computing the mean (Mean), principal component analysis (PCA), and locality preserving projections (LPP). We also investigated the effect of decoding between utilizing a template head model and a subject-specific head model during the source localization. The results indicated that applying source-space decoding with PCA yielded slightly higher correlations and signal-to-noise (SNR) ratios than the sensor-space approach. We also observed slightly higher correlations and SNRs when applying the subject-specific head model than the template head model. However, the statistical tests revealed no significant differences between the source-space and sensor-space approaches and no significant differences between subject-specific and template head models. The decoder with Mean and PCA utilizes information mainly from precuneus and cuneus to decode the velocity kinematics similarly in the subject-specific and template head models.

  3. d

    McCall Glacier mass balance and GPS survey data 2003-2014

    • search.dataone.org
    Updated Aug 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Nolan (2023). McCall Glacier mass balance and GPS survey data 2003-2014 [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A17355550-0cf8-4a59-8c43-0013e829b87f
    Explore at:
    Dataset updated
    Aug 22, 2023
    Dataset provided by
    Arctic Data Center
    Authors
    Matt Nolan
    Time period covered
    Jan 1, 2003 - Jan 1, 2014
    Area covered
    Variables measured
    Pole #, Pole ID, latitude, longitude, _ Elevation, HAE Elevation, Snow Depth (m), MB Measure Date, Pole Length (m), GPS Measure Date, and 6 more
    Description

    This archive includes all raw mass balance and GPS data from survey poles on McCall Glacier, arctic Alaska, from 2003 to 2014 and all reductions of it through 2022. McCall Glacier has been studied since 1957 and the research program there is one of the most comprehensive of any glacier worldwide. This data archive is organized by field campaign, twice annually, with all known raw and reduced data placed within the campaign's folder along with a reduced spreadsheet and notes for each campaign. A single reduced master spreadsheet captures the reduced data from all campaigns. Both the mass balance and velocity data time-series were analyzed in detail in this master spreadsheet and all corrections annotated within a 460 page PDF, which describes the data collection methods, the data reduction methods, and the data archive itself. Corrections to the raw data discovered in the time-series analysis, as described within the PDF, were placed within a corrections spreadsheet designed to be added cell by cell to the master spreadsheet. The PDF also includes various plots for each survey pole's mass balance and motion to facilitate quick visualization of the data. Scans of fieldbooks and online blogs related to individual campaigns are captured here as well to provide context for the measurements.

  4. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 43% growth during the forecast period.
    By Deployment - Cloud segment was valued at USD 1.75 billion in 2023
    By Component - Solution segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 173.26 million
    Market Future Opportunities: USD 4441.70 million
    CAGR from 2024 to 2029 : 14.4%
    

    Market Summary

    Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage.
    According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Anomaly Detection Market Segmented ?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period.

    The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.

    This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.

    Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.

  5. m

    Data for: Trace element associations in hydrothermal pyrite at the Geita...

    • data.mendeley.com
    Updated Dec 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Van Ryt (2019). Data for: Trace element associations in hydrothermal pyrite at the Geita Hill gold deposit, Tanzania, revealed through LA-ICP-MS mapping [Dataset]. http://doi.org/10.17632/wpw9xssjzm.1
    Explore at:
    Dataset updated
    Dec 5, 2019
    Authors
    Matthew Van Ryt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Tanzania, Geita
    Description

    Database contains LA-ICP-MS data and data reduction techniques relevant to the submitted article "Trace element associations in hydrothermal pyrite at the Geita Hill gold deposit, Tanzania, revealed through LA-ICP-MS mapping", along with a contents file.

    Appendix 1 contains the LA-ICP-MS spot analysis data cited in the article Appendix 2 contains all LA-ICP-MS data processed for this research. Appendix 3 contains all La-ICP-MS transects processed for this research. All transects were ablated through pyrite grains. Appendix 4 contains the 9 LA-ICP-MS trace element images as a stand-alone ArcMap database file. Instructions for reading this file are included in the contents PDF. Appendix 5 contains the data reduction methodology used to create the trace element images. Appendix 6 contains the raw data underpinning the trace element images.

  6. c

    Data from: Smart metering and energy access programs: an approach to energy...

    • esango.cput.ac.za
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bennour Bacar (2023). Smart metering and energy access programs: an approach to energy poverty reduction in sub-Saharan Africa [Dataset]. http://doi.org/10.25381/cput.22264042.v1
    Explore at:
    Dataset updated
    May 31, 2023
    Dataset provided by
    Cape Peninsula University of Technology
    Authors
    Bennour Bacar
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Sub-Saharan Africa
    Description

    Ethical clearance reference number: refer to the uploaded document Ethics Certificate.pdf.

    General (0)

    0 - Built diagrams and figures.pdf: diagrams and figures used for the thesis

    Analysis of country data (1)

    0 - Country selection.xlsx: In this analysis the sub-Saharan country (Niger) is selected based on the kWh per capita data obtained from sources such as the United Nations and the World Bank. Other data used from these sources includes household size and electricity access. Some household data was projected using linear regression. Sample sizes VS error margins were also analyzed for the selection of a smaller area within the country.

    Smart metering experiment (2)

    The figures (PNG, JPG, PDF) include:

        - The experiment components and assembly
        - The use of device (meter and modem) softwar tools to program and analyse data
        - Phasor and meter detail
        - Extracted reports and graphs from the MDMS
    

    The datasets (CSV, XLSX) include:

        - Energy load profile and register data recorded by the smart meter and collected by both meter configuration and MDM applications.
        - Data collected also includes events, alarm and QoS data.
    

    Data applicability to SEAP (3)

    3 - Energy data and SEAP.pdf: as part of the Smart Metering VS SEAP framework analysis, a comparison between SEAP's data requirements, the applicable energy data to those requirements, the benefits, and the calculation of indicators where applicable. 3 - SEAP indicators.xlsx: as part of the Smart Metering VS SEAP framework analysis, the applicable calculation of indicators for SEAP's data requirements.

    Load prediction by machine learning (4)

    The coding (IPYNB, PY, HTML, ZIP) shows the preparation and exploration of the energy data to train the machine learning model. The datasets (CSV, XLSX), sequentially named, are part of the process of extracting, transforming and loading the data into a machine learning algorithm, identifying the best regression model based on metrics, and predicting the data.

    HRES analysis and optimization (5)

    The figures (PNG, JPG, PDF) include:

        - Household load, based on the energy data from the smart metering experiment and the machine learning exercise
        - Pre-defined/synthetic load, provided by the software when no external data (household load) is available, and
        - The HRES designed
        - Application-generated reports with the results of the analysis, for both best case HRES and fully renewable scenarios.
    

    The datasets (XLSX) include the 12-month input load for the simulation, and the input/output analysis and calculations. 5 - Gorou_Niger_20220529_v3.homer: software (Homer Pro) file with the simulated HRES

    · Conferences (6)

    6 – IEEE_MISTA_2022_paper_51.pdf: paper (research in progress) presented at the IEEE MISTA 2022 conference, occurred in March-2022, and published in the respective proceeding, 6 - IEEE_MISTA_2022_proceeding.pdf. 6 - ITAS_2023.pdf: paper (final research) recently presented at the ITAS 2023 conference in Doha, Qatar, in March-2023. 6 - Smart Energy Seminar 2023.pptx: PowerPoint slide version of the paper, recently presented at the Smart Energy Seminar held at CPUT, in March-2023.

  7. datasheet1_The Spectral Underpinning of word2vec.pdf

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ariel Jaffe; Yuval Kluger; Ofir Lindenbaum; Jonathan Patsenker; Erez Peterfreund; Stefan Steinerberger (2023). datasheet1_The Spectral Underpinning of word2vec.pdf [Dataset]. http://doi.org/10.3389/fams.2020.593406.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Ariel Jaffe; Yuval Kluger; Ofir Lindenbaum; Jonathan Patsenker; Erez Peterfreund; Stefan Steinerberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.

  8. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Central Statistical Organization (CSO)
    Economic Research Forum
    Kurdistan Regional Statistics Office (KRSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  9. f

    Data_Sheet_1_NeuralEE: A GPU-Accelerated Elastic Embedding Dimensionality...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiankang Xiong; Fuzhou Gong; Lin Wan; Liang Ma (2023). Data_Sheet_1_NeuralEE: A GPU-Accelerated Elastic Embedding Dimensionality Reduction Method for Visualizing Large-Scale scRNA-Seq Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2020.00786.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Jiankang Xiong; Fuzhou Gong; Lin Wan; Liang Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dramatic increase in amount and size of single-cell RNA sequencing data calls for more efficient and scalable dimensional reduction and visualization tools. Here, we design a GPU-accelerated method, NeuralEE, which aggregates the advantages of elastic embedding and neural network. We show that NeuralEE is both scalable and generalizable in dimensional reduction and visualization of large-scale scRNA-seq data. In addition, the GPU-based implementation of NeuralEE makes it applicable to limited computational resources while maintains high performance, as it takes only half an hour to visualize 1.3 million mice brain cells, and NeuralEE has generalizability for integrating newly generated data.

  10. f

    Data Sheet 1_Evaluating machine learning pipelines for multimodal...

    • figshare.com
    pdf
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shailesh Appukuttan; Aude-Marie Grapperon; Mounir Mohamed El Mendili; Hugo Dary; Maxime Guye; Annie Verschueren; Jean-Philippe Ranjeva; Shahram Attarian; Wafaa Zaaraoui; Matthieu Gilson (2025). Data Sheet 1_Evaluating machine learning pipelines for multimodal neuroimaging in small cohorts: an ALS case study.pdf [Dataset]. http://doi.org/10.3389/fninf.2025.1568116.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    Frontiers
    Authors
    Shailesh Appukuttan; Aude-Marie Grapperon; Mounir Mohamed El Mendili; Hugo Dary; Maxime Guye; Annie Verschueren; Jean-Philippe Ranjeva; Shahram Attarian; Wafaa Zaaraoui; Matthieu Gilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancements in machine learning hold great promise for the analysis of multimodal neuroimaging data. They can help identify biomarkers and improve diagnosis for various neurological disorders. However, the application of such techniques for rare and heterogeneous diseases remains challenging due to small-cohorts available for acquiring data. Efforts are therefore commonly directed toward improving the classification models, in an effort to optimize outcomes given the limited data. In this study, we systematically evaluated the impact of various machine learning pipeline configurations, including scaling methods, feature selection, dimensionality reduction, and hyperparameter optimization. The efficacy of such components in the pipeline was evaluated on classification performance using multimodal MRI data from a cohort of 16 ALS patients and 14 healthy controls. Our findings reveal that, while certain pipeline components, such as subject-wise feature normalization, help improve classification outcomes, the overall influence of pipeline refinements on performance is modest. Feature selection and dimensionality reduction steps were found to have limited utility, and the choice of hyperparameter optimization strategies produced only marginal gains. Our results suggest that, for small-cohort studies, the emphasis should shift from extensive tuning of these pipelines to addressing data-related limitations, such as progressively expanding cohort size, integrating additional modalities, and maximizing the information extracted from existing datasets. This study provides a methodological framework to guide future research and emphasizes the need for dataset enrichment to improve clinical utility.

  11. Table1_A comparative study in class imbalance mitigation when working with...

    • frontiersin.figshare.com
    pdf
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rawan S. Abdulsadig; Esther Rodriguez-Villegas (2024). Table1_A comparative study in class imbalance mitigation when working with physiological signals.pdf [Dataset]. http://doi.org/10.3389/fdgth.2024.1377165.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Rawan S. Abdulsadig; Esther Rodriguez-Villegas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Class imbalance is a common challenge that is often faced when dealing with classification tasks aiming to detect medical events that are particularly infrequent. Apnoea is an example of such events. This challenge can however be mitigated using class rebalancing algorithms. This work investigated 10 widely used data-level class imbalance mitigation methods aiming towards building a random forest (RF) model that attempts to detect apnoea events from photoplethysmography (PPG) signals acquired from the neck. Those methods are random undersampling (RandUS), random oversampling (RandOS), condensed nearest-neighbors (CNNUS), edited nearest-neighbors (ENNUS), Tomek’s links (TomekUS), synthetic minority oversampling technique (SMOTE), Borderline-SMOTE (BLSMOTE), adaptive synthetic oversampling (ADASYN), SMOTE with TomekUS (SMOTETomek) and SMOTE with ENNUS (SMOTEENN). Feature-space transformation using PCA and KernelPCA was also examined as a potential way of providing better representations of the data for the class rebalancing methods to operate. This work showed that RandUS is the best option for improving the sensitivity score (up to 11%). However, it could hinder the overall accuracy due to the reduced amount of training data. On the other hand, augmenting the data with new artificial data points was shown to be a non-trivial task that needs further development, especially in the presence of subject dependencies, as was the case in this work.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Geological Survey (2025). Best Management Practices Statistical Estimator (BMPSE) Version 1.2.0 [Dataset]. https://catalog.data.gov/dataset/best-management-practices-statistical-estimator-bmpse-version-1-2-0

Data from: Best Management Practices Statistical Estimator (BMPSE) Version 1.2.0

Related Article
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description

The Best Management Practices Statistical Estimator (BMPSE) version 1.2.0 was developed by the U.S. Geological Survey (USGS), in cooperation with the Federal Highway Administration (FHWA) Office of Project Delivery and Environmental Review to provide planning-level information about the performance of structural best management practices for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway and urban runoff on the Nation's receiving waters (Granato 2013, 2014; Granato and others, 2021). The BMPSE was assembled by using a Microsoft Access® database application to facilitate calculation of BMP performance statistics. Granato (2014) developed quantitative methods to estimate values of the trapezoidal-distribution statistics, correlation coefficients, and the minimum irreducible concentration (MIC) from available data. Granato (2014) developed the BMPSE to hold and process data from the International Stormwater Best Management Practices Database (BMPDB, www.bmpdatabase.org). Version 1.0 of the BMPSE contained a subset of the data from the 2012 version of the BMPDB; the current version of the BMPSE (1.2.0) contains a subset of the data from the December 2019 version of the BMPDB. Selected data from the BMPDB were screened for import into the BMPSE in consultation with Jane Clary, the data manager for the BMPDB. Modifications included identifying water quality constituents, making measurement units consistent, identifying paired inflow and outflow values, and converting BMPDB water quality values set as half the detection limit back to the detection limit. Total polycyclic aromatic hydrocarbons (PAH) values were added to the BMPSE from BMPDB data; they were calculated from individual PAH measurements at sites with enough data to calculate totals. The BMPSE tool can sort and rank the data, calculate plotting positions, calculate initial estimates, and calculate potential correlations to facilitate the distribution-fitting process (Granato, 2014). For water-quality ratio analysis the BMPSE generates the input files and the list of filenames for each constituent within the Graphical User Interface (GUI). The BMPSE calculates the Spearman’s rho (ρ) and Kendall’s tau (τ) correlation coefficients with their respective 95-percent confidence limits and the probability that each correlation coefficient value is not significantly different from zero by using standard methods (Granato, 2014). If the 95-percent confidence limit values are of the same sign, then the correlation coefficient is statistically different from zero. For hydrograph extension, the BMPSE calculates ρ and τ between the inflow volume and the hydrograph-extension values (Granato, 2014). For volume reduction, the BMPSE calculates ρ and τ between the inflow volume and the ratio of outflow to inflow volumes (Granato, 2014). For water-quality treatment, the BMPSE calculates ρ and τ between the inflow concentrations and the ratio of outflow to inflow concentrations (Granato, 2014; 2020). The BMPSE also calculates ρ between the inflow and the outflow concentrations when a water-quality treatment analysis is done. The current version (1.2.0) of the BMPSE also has the option to calculate urban-runoff quality statistics from inflows to BMPs by using computer code developed for the Highway Runoff Database (Granato and Cazenas, 2009;Granato, 2019). Granato, G.E., 2013, Stochastic empirical loading and dilution model (SELDM) version 1.0.0: U.S. Geological Survey Techniques and Methods, book 4, chap. C3, 112 p., CD-ROM https://pubs.usgs.gov/tm/04/c03 Granato, G.E., 2014, Statistics for stochastic modeling of volume reduction, hydrograph extension, and water-quality treatment by structural stormwater runoff best management practices (BMPs): U.S. Geological Survey Scientific Investigations Report 2014–5037, 37 p., http://dx.doi.org/10.3133/sir20145037. Granato, G.E., 2019, Highway-Runoff Database (HRDB) Version 1.1.0: U.S. Geological Survey data release, https://doi.org/10.5066/P94VL32J. Granato, G.E., and Cazenas, P.A., 2009, Highway-Runoff Database (HRDB Version 1.0)--A data warehouse and preprocessor for the stochastic empirical loading and dilution model: Washington, D.C., U.S. Department of Transportation, Federal Highway Administration, FHWA-HEP-09-004, 57 p. https://pubs.usgs.gov/sir/2009/5269/disc_content_100a_web/FHWA-HEP-09-004.pdf Granato, G.E., Spaetzel, A.B., and Medalie, L., 2021, Statistical methods for simulating structural stormwater runoff best management practices (BMPs) with the stochastic empirical loading and dilution model (SELDM): U.S. Geological Survey Scientific Investigations Report 2020–5136, 41 p., https://doi.org/10.3133/sir20205136

Search
Clear search
Close search
Google apps
Main menu