Facebook
TwitterProvides an aggregate of data for the Office of the Actuary and the Office of Research, Evaluation and Statistics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.
Facebook
TwitterSample descriptive statistics continuous variables (n = 346).
Facebook
TwitterSimplex-valued data appear throughout statistics and machine learning, for example in the context of transfer learning and compression of deep networks.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A continuous casting machine (hereafter ‘CCM’) is a unit that transforms liquid steel into solid billets of a given section, from which rolling is subsequently produced (for example, rebars). The mould sleeve is the most critical and quickly worn part of the CCM mould. The sleeve is a water-cooled copper pipe with a round or profile section. The molten metal crystallizes in contact with the sleeve walls, and the primary solid shell of the ingot is formed. The main production issue that comes up during the operation of sleeves is that defects appear on the surface of the copper pipe of the sleeve and distort the profile of its inner cavity. This disrupts the thermal conditions, which in turn affects the quality of the resulting ingots. There can be shape defects (for example, the diagonals of a square ingot become unequal and the so-called ‘rhomboidity’ occurs), the dimensions of the sides can come out wrong, and the ingot corners may develop cracks. These defects lead to further problems in rolling: the decreased quality of products and the number of rejects adversely affect the economic efficiency of production. To prevent this, the sleeve dimensions are measured at certain intervals along the entire length. If these dimensions deviate from the design ones, the sleeve is rejected. Another issue is a shorter useful life of the copper sleeves of the mould used in production. This issue is often associated with a change in the operating parameters of the continuous casting machine itself. Such parameters include temperature of the incoming molten metal, temperature of cooling water and others. The actual useful life of a mould sleeve is often less than that stated by the manufacturer, which again leads to additional equipment downtime and increases the possibility of accidents and extra production costs. The expected useful life in tons should be as follows: - 17,000 tons for 180x180, - 13,000 tons for 150x150.
In the course of CCM operation, the automatic control system that runs the process of casting ingots creates a database of casting parameters. The collected parameters are averaged data for all the strands in each cast; the only thing that differs is the resistance of the sleeve for each strand. After removing the mould sleeve for inspection, the initial data on the process parameters of casting, the geometry of obtained ingots and other attributes can be uploaded from the SCADA. The data were collected from a real production facility but after that they were processed, cleared, aggregated and prepared by the authors to solve the RUL problem.
This column is formed from the column "resistance, tonn" where for each sleeve, num_crystallizer and num_stream from the highest resistance (moment of breaking) current value is subtracted.
The main task, based on the dataset, was to develop a model for determining the remaining useful life in tons, or remaining casts, of the crystallizer sleeve (the ‘RUL problem’). It is recommended to solve the problem for each cast from the first to the last minus one. However, in addition to solving the RUL problem, it is always important for production to tackle the task of determining the main factors that influence the reduction and extension of the remaining useful life. This is relevant because many sleeves fail to operate up to the expected useful life, indicated above. To this end, you may want to set yourself to solve the following tasks: - identify the factors that affect the remaining useful life, - develop recommendations on how to increase it, - compare the performance of sleeves that have and have not had the target resistance and determine the parameters that brought this about.
Facebook
TwitterThis data release includes water-quality data collected at up to thirteen locations along the Merrimack River and Merrimack River Estuary in Massachusetts. In this study, conducted by the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Department of Environmental Protection, discrete samples were collected, and continuous monitoring was completed from June to September 2020. The data include results of measured field properties (water temperature, specific conductivity, pH, dissolved oxygen) and laboratory concentrations of nitrogen and phosphorus species, total carbon, pheophytin-a, and chlorophyll-a. These data were collected to assess selected (mainly nutrients) water-quality conditions in the Merrimack River and Merrimack River Estuary at the thirteen locations and identify areas where more water-quality monitoring is needed. The discrete samples and continuous-monitoring data are also available in the USGS National Water Information System at https://waterdata.usgs.gov/nwis. This data release consists of (1) Table of the discrete water-quality data collected (Merrimack_DiscreteWQ_Data.csv); (2) Statistical summaries including the minimum, median, and maximum of the discrete water-quality data collected (Merrimack_DiscreteWQ_Statistical_Data.original.csv); (3) Statistical summaries including the minimum, median, and maximum of the continuous water-quality data collected (Merrimack_ContinuousWQ_Statistical_Data.csv); (4) Table of vertical profile data (Merrimack_VerticalWQ_Profiles_Data.csv); (5) Table of continuous monitor deployment location and dates (Merrimack_ContinuousWQ_Deployment_Dates.csv); (6) Time-series plots of continuous water-quality data (Continuous_QW_Plots_All.zip); (7) Vertical profile plots (Vertical Profiles_QW_Plots.zip).
Facebook
TwitterMany variables in biological research - from body size to life history timing to environmental characteristics - are measured continuously (e.g., body mass in kilograms) but analyzed as categories (e.g., large versus small), which can lower statistical power and change interpretation. We conducted a mini-review of 72 recent publications in six popular ecology, evolution, and behavior journals to quantify the prevalence of categorization. We then summarized commonly categorized metrics and simulated a dataset to demonstrate the drawbacks of categorization using common variables and realistic examples. We show that categorizing continuous variables is common (31% of publications reviewed). We also underscore that predictor variables can and should be collected and analyzed continuously. Finally, we provide recommendations on how to keep variables continuous throughout the entire scientific process. Together, these pieces comprise an actionable guide to increasing statistical power and fac..., , , # Overcoming the pitfalls of categorizing continuous variables in ecology and evolutionary biology
https://doi.org/10.5061/dryad.5x69p8d9r
We simulated data to quantify the detrimental impact of categorizing continuous variables using various statistical breakpoints and sample sizes (details below). To give the example biological relevance, we created a dataset that illustrates the complexity of life history theory and climate change impacts, and contains a predictor variable that is frequently categorized (Table 2) - reproductive timing in one year and its effect on body size in the following year. A reasonable research question would be: How does timing of reproduction in year t influence body mass at the start of the breeding season in year t+1? For illustrative purposes, let’s say we collected data from individually banded penguins in Antarctica. Based on the mechanistic relationships between seasonally available sea ice and food availabi...
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Papua New Guinea and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For PNG, after five rounds of data collection from 2020-2022, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. This followed an initial pilot of the data collection from January 2023-March 2023. Data for April 2023-September 2023 were a repeated cross section, while October 2023 established the first month of a panel, which is ongoing as of March 2025. For each month, approximately 550-1000 households were interviewed. The sample is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in PNG. There is one date file for household level data with a unique household ID, and separate files for individual level data within each household data, and household food price data, that can be matched to the household file using the household ID. A unique individual ID within the household data which can be used to track individuals over time within households.
Urban and rural areas of Papua New Guinea
Household, Individual
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification from a large random sample of Digicel’s subscribers. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The resulting overall sample has a probability-based weighted design, with a proportionate stratification to achieve a proper geographical representation. More information on sampling for the cross-sectional monthly sample can be found in previous documentation for the PNG HFPS data.
A monthly panel was established in October 2023, that is ongoing as of March 2025. In each subsequent round of data collection after October 2024, the survey firm would first attempt to contact all households from the previous month, and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households.
Computer Assisted Telephone Interview [cati]
he questionnaire, which can be found in the External Resources of this documentation, is in English with a Pidgin translation.
The survey instrument for Q1 2025 consists of the following modules: -1. Basic Household information, -2. Household Roster, -3. Labor, -4a Food security, -4b Food prices -5. Household income, -6. Agriculture, -8. Access to services, -9. Assets -10. Wellbeing and shocks -10a. WASH
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.
Facebook
TwitterCANNS Analysis Datasets
This repository contains example datasets for the CANNS (Continuous Attractor Neural Networks) data analysis package.
Datasets
ROI_data.txt (703 KB)
Description: 1D CANN ROI data for bump analysis Format: Text file with neural activity measurements Usage: 1D CANN analysis, MCMC bump fitting Example: Used in 1D CANN analysis tutorials
grid_1.npz (8.7 MB)
Description: Grid cell spike data with position information Format:… See the full description on the dataset page: https://huggingface.co/datasets/canns-team/data-analysis-datasets.
Facebook
TwitterSample of sequential rules mined from anonymity datasets generated by LBS continuous queries.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Vanuatu and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For Vanuatu, data for December 2023 – January 2025 was collected with each month having approximately 1000 households in the sample and is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Vanuatu. There is one date file for household level data with a unique household ID. And a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.
National, urban and rural. Six provinces were covered by this survey: Sanma, Shefa, Torba, Penama, Malampa and Tafea.
Household and individuals.
Sample survey data [ssd]
The Vanuatu High Frequency Phone Survey (HFPS) sample is drawn from the list of customer phone numbers (MSIDNS) provided by Digicel Vanuatu, one of the country’s two main mobile providers. Digicel’s customer base spans all regions of Vanuatu. For the initial data collection, Digicel filtered their MSIDNS database to ensure a representative distribution across regions. Recognizing the challenge of reaching low-income respondents, Digicel also included low-income areas and customers with a low-income profile (defined by monthly spending between 50 and 150 VT), as well as those with only incoming calls or using the IOU service without repayment. These filtered lists were then randomized, and enumerators began calling the numbers.
This approach was used to complete the first round of 1,000 interviews. The respondents from this first round formed a panel to be surveyed monthly. Each month, phone numbers from the panel are contacted until all have been interviewed, at which point new phone numbers (fresh MSIDNS from Digicel’s database) are used to replace those that have been exhausted. These new respondents are then added to the panel for future surveys.
Computer Assisted Telephone Interview [cati]
The questionnaire was developed in both English and Bislama. Sections of the Questionnaire:
-Interview Information
-Household Roster (separate modules for new households and returning households)
-Labor (separate modules for new households and returning households)
-Food Security
-Household Income
-Agriculture
-Social Protection
-Access to Services
-Assets
-Perceptions
-Follow-up
At the end of data collection, the raw dataset was cleaned by the survey firm and the World Bank team. Data cleaning mainly included formatting, relabeling, and excluding survey monitoring variables (e.g., interview start and end times). Data was edited using the software STATA.
The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 13,779 in the household dataset and 77,501 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (hhid_mem) can be found in the individual dataset.
In November 2024, a total of 7,874 calls were made. Of these, 2,251 calls were successfully connected, and 1,000 respondents completed the survey. By February 2024, the sample was fully comprised of returning respondents, with a re-contact rate of 99.9 percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are using the Yelp Review Dataset as the streaming data source for the DataCI example. We have processed the Yelp review dataset into a daily-based dataset by its date. In this dataset, we will only use the data from 2020-09-01 to 2020-11-30 to simulate the streaming data scenario. We are downloading two versions of the training and validation datasets:
yelp_review_train@2020-10: from 2020-09-01 to 2020-10-15
yelp_review_val@2020-10: from 2020-10-16 to 2020-10-31
yelp_review_train@2020-11: from 2020-10-01 to 2020-11-15
yelp_review_val@2020-11: from 2020-11-16 to 2020-11-30
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Tonga and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details. For Tonga, after two rounds of data collection from in 2022, monthly HFPS data collection commenced in April 2023 and continued until November 2024 (but with some gaps in the months of collection). The survey collected socio-economic data on topics including employment, income, food security, health, food prices, assets and well-being. Each month of collection has approximately 415 households in the sample and is representative of urban and rural areas. This dataset contains combined monthly survey data for all months of the continuous HFPS in Tonga.
National urban and rural areas (5 islands): Tongatapu, Vava'u, Ha'apai, Eua, Ongo Niua
Individual and household.
Sample survey data [ssd]
The Tonga High Frequency Phone Survey (HFPS) monthly sample was generated in three ways. The first method is Random Digit Dialing (RDD) process covering all cell telephone numbers active at the time of the sample selection. The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone numbering plan and then draws a random sample of numbers. This method guarantees full coverage of the population with a phone.
First, a large first-phase sample of cell phone numbers was selected and screened through an automated process to identify the active numbers. Then, a smaller second-phase sample was selected from the active residential numbers identified in the first-phase sample and was delivered to the data collection team to be called by the interviewers. When a cell phone was called, the call answerer was interviewed as long as he or she was 18 years of age or above and knowledgeable about the household activities.
It was initially planned to stratify the sample by island group based on the phone number prefixes. However, this was not feasible given the high internal migration across islands and the atypical assignment of phone number prefixes across islands in Tonga. The raw sample is overrepresenting urban areas and the population of Tongatapu.
Computer Assisted Telephone Interview [cati]
The questionnaire was developed in both English and Tongan and can be found in this documentation in Excel format. Sections of the Questionnaire are provided below: 1. Interview information and Basic information 2. Household roster 3. Labor 4. Food security and food prices 5. Household income 6. Agriculture 7. Social protection 8. Access to services 9. Assets 10. Education 11. Follow up
At the end of data collection, the raw dataset was cleaned by the survey firm and the World Bank team. Data cleaning mainly included formatting, relabeling, and excluding survey monitoring variables (e.g., interview start and end times). Data was edited using the software Stata.
Facebook
TwitterCompany Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid development of the Internet, the continuous increase of malware and its variants have brought greatly challenges for cyber security. Due to the imbalance of the data distribution, the research on malware detection focuses on the accuracy of the whole data sample, while ignoring the detection rate of the minority categories’ malware. In the dataset sample, the normal data samples account for the majority, while the attacks’ malware accounts for the minority. However, the minority categories’ attacks will bring great losses to countries, enterprises, or individuals. For solving the problem, this study proposed the GNGS algorithm to construct a new balance dataset for the model algorithm to pay more attention to the feature learning of the minority attacks’ malware to improve the detection rate of attacks’ malware. The traditional malware detection method is highly dependent on professional knowledge and static analysis, so we used the Self-Attention with Gate mechanism (SAG) based on the Transformer to carry out feature extraction between the local and global features and filter irrelevant noise information, then extracted the long-distance dependency temporal sequence features by the BiGRU network, and obtained the classification results through the SoftMax classifier. In the study, we used the Alibaba Cloud dataset for malware multi-classification. Compared the GSB deep learning network model with other current studies, the experimental results showed that the Gaussian noise generation strategy (GNGS) could solve the unbalanced distribution of minority categories’ malware and the SAG-BiGRU algorithm obtained the accuracy rate of 88.7% on the eight-classification, which has better performance than other existing algorithms, and the GSB model also has a good effect on the NSL-KDD dataset, which showed the GSB model is effective for other network intrusion detection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traditionally, datasets with multiple censored time-to-events have not been utilized in multivariate analysis because of their high level of complexity. In this paper, we propose the Censored Time Interval Analysis (CTIVA) method to address this issue. It estimates the joint probability distribution of actual event times in the censored dataset by implementing a statistical probability density estimation technique on the dataset. Based on the acquired event time, CTIVA investigates variables correlated with the interval time of events via statistical tests. The proposed method handles both categorical and continuous variables simultaneously—thus, it is suitable for application on real-world censored time-to-event datasets, which include both categorical and continuous variables. CTIVA outperforms traditional censored time-to-event data handling methods by 5% on simulation data. The average area under the curve (AUC) of the proposed method on the simulation dataset exceeds 0.9 under various conditions. Further, CTIVA yields novel results on National Sample Cohort Demo (NSCD) and proteasome inhibitor bortezomib dataset, a real-world censored time-to-event dataset of medical history of beneficiaries provided by the National Health Insurance Sharing Service (NHISS) and National Center for Biotechnology Information (NCBI). We believe that the development of CTIVA is a milestone in the investigation of variables correlated with interval time of events in presence of censoring.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Solomon Islands and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For Solmon Islands, after five rounds of data collection from 2020-2020, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. Fieldwork took place in two non-consecutive weeks of each month. Data for April 2023-December 2023 were a repeated cross section, while January 2024 established the first month of a panel, the was continued to September 2024. Each month has approximately 550 households in the sample and is representative of urban and rural areas, but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Solomon Islands. There is one date file for household level data with a unique household ID. and a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.
Urban and rural areas of Solomon Islands.
Household, individual.
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The initial sample was drawn from information provided by a major phone service provider in Solomon Islands, covering all the provinces in the country. It had a probability-based weighted design, with a proportionate stratification to achieve geographical representation. The geographical distribution compared to the 2019 Census is listed below for the first month of the HFPS monthly survey:
Choiseul : Census: 4.3%, HFPS: 5.2% Western : Census: 14.4%, HFPS: 13.7% Isabel : Census: 4.8%, HFPS: 4.7% Central : Census: 3.6%, HFPS: 5.2% Ren Bell : Census: 0.6%, HFPS: 1.4% Guadalcanal: Census: 19.8%, HFPS: 21.1% Malaita : Census: 23.1%, HFPS: 18.7% Makira : Census: 5.6%, HFPS: 5.6% Temotu: Census: 3.0%, HFPS: 3% Honiara: Census: 20.7%, HFPS: 21.3%
Source: Census of Population and Housing 2019
Note: The values in the HFPS column represent the proportion of survey participants residing in each province, based on the raw HFPS data from April.
In April 2023, the geographic distribution of World Bank HFPS participants was generally similar to that of the census data at the province level, though within provinces, areas with less mobile phone connectivity are likely to be underrepresented. One indication of this is that urban areas constituted 38.2 percent of the survey sample, which is a slight overrepresentation, compared to 32.5 percent in the Census 2019.
A monthly panel was established in January 2024, that is ongoing as of March 2025. In each subsequent month after January 2024, the survey firm would first attempt to contact all households from the previous month and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households. Across all months of the survey a total of, 9,926 interviews were completed.
Computer Assisted Telephone Interview [cati]
The questionnaire, which can be found in the External Resources of this documentation, is available in English, with Solomons Pijin translation. There were few changes to the questionnaire across the survey months, but some sections were only introduced in 2024, namely energy access questions and questions to inform the baseline data of the Solomon Islands Government Integrated Economic Development and Climate Resilience (IEDCR) project.
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 9,926 in the household dataset and 62,054 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.
Facebook
TwitterSite-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
Facebook
TwitterThis dataset includes examples of different tenses in the English language to aid English learners and educators in understanding the usage of various tenses. Each example sentence is paired with the corresponding tense it represents.
Dataset Information:
Data Collection Source: The data was created by generating example sentences that demonstrate the use of different tenses in the English language.
Data Fields: Sentence: An example sentence in English. Tense: - present - past - future - present continuous - past continuous - future continuous - present perfect - past perfect - future perfect - present perfect continuous - past perfect continuous
Facebook
TwitterAll operator and inspector dust samples taken for gravimetric samples. It includes information such as cassette numbers, date the sample was taken, initial and final weights, sample type, occupation codes related to the person taking the sample and mine information. Cassette number is the primary key for gravimetric samples. It also contains operator Continuous Personal Dust Monitor (CPDM) samples for operators as of 2/1/2016. The unique key is the CPDM file name. This dataset can be linked to the Mines dataset for further mine information.
Facebook
TwitterProvides an aggregate of data for the Office of the Actuary and the Office of Research, Evaluation and Statistics.