100+ datasets found

d
2010 County and City-Level Water-Use Data and Associated Explanatory...
catalog.data.gov
data.usgs.gov
+3more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). 2010 County and City-Level Water-Use Data and Associated Explanatory Variables [Dataset]. https://catalog.data.gov/dataset/2010-county-and-city-level-water-use-data-and-associated-explanatory-variables
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
This data release contains the input-data files and R scripts associated with the analysis presented in [citation of manuscript]. The spatial extent of the data is the contiguous U.S. The input-data files include one comma separated value (csv) file of county-level data, and one csv file of city-level data. The county-level csv (“county_data.csv”) contains data for 3,109 counties. This data includes two measures of water use, descriptive information about each county, three grouping variables (climate region, urban class, and economic dependency), and contains 18 explanatory variables: proportion of population growth from 2000-2010, fraction of withdrawals from surface water, average daily water yield, mean annual maximum temperature from 1970-2010, 2005-2010 maximum temperature departure from the 40-year maximum, mean annual precipitation from 1970-2010, 2005-2010 mean precipitation departure from the 40-year mean, Gini income disparity index, percent of county population with at least some college education, Cook Partisan Voting Index, housing density, median household income, average number of people per household, median age of structures, percent of renters, percent of single family homes, percent apartments, and a numeric version of urban class. The city-level csv (city_data.csv) contains data for 83 cities. This data includes descriptive information for each city, water-use measures, one grouping variable (climate region), and 6 explanatory variables: type of water bill (increasing block rate, decreasing block rate, or uniform), average price of water bill, number of requirement-oriented water conservation policies, number of rebate-oriented water conservation policies, aridity index, and regional price parity. The R scripts construct fixed-effects and Bayesian Hierarchical regression models. The primary difference between these models relates to how they handle possible clustering in the observations that define unique water-use settings. Fixed-effects models address possible clustering in one of two ways. In a "fully pooled" fixed-effects model, any clustering by group is ignored, and a single, fixed estimate of the coefficient for each covariate is developed using all of the observations. Conversely, in an unpooled fixed-effects model, separate coefficient estimates are developed only using the observations in each group. A hierarchical model provides a compromise between these two extremes. Hierarchical models extend single-level regression to data with a nested structure, whereby the model parameters vary at different levels in the model, including a lower level that describes the actual data and an upper level that influences the values taken by parameters in the lower level. The county-level models were compared using the Watanabe-Akaike information criterion (WAIC) which is derived from the log pointwise predictive density of the models and can be shown to approximate out-of-sample predictive performance. All script files are intended to be used with R statistical software (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org) and Stan probabilistic modeling software (Stan Development Team. 2017. RStan: the R interface to Stan. R package version 2.16.2. http://mc-stan.org).
Z
Data from: Open-data release of aggregated Australian school-level...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monteiro Lobato, (2020). Open-data release of aggregated Australian school-level information. Edition 2016.1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_46086
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Monteiro Lobato,
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Australia
Description
The file set is a freely downloadable aggregation of information about Australian schools. The individual files represent a series of tables which, when considered together, form a relational database. The records cover the years 2008-2014 and include information on approximately 9500 primary and secondary school main-campuses and around 500 subcampuses. The records all relate to school-level data; no data about individuals is included. All the information has previously been published and is publicly available but it has not previously been released as a documented, useful aggregation. The information includes: (a) the names of schools (b) staffing levels, including full-time and part-time teaching and non-teaching staff (c) student enrolments, including the number of boys and girls (d) school financial information, including Commonwealth government, state government, and private funding (e) test data, potentially for school years 3, 5, 7 and 9, relating to an Australian national testing programme know by the trademark 'NAPLAN'

Documentation of this Edition 2016.1 is incomplete but the organization of the data should be readily understandable to most people. If you are a researcher, the simplest way to study the data is to make use of the SQLite3 database called 'school-data-2016-1.db'. If you are unsure how to use an SQLite database, ask a guru.

The database was constructed directly from the other included files by running the following command at a command-line prompt: sqlite3 school-data-2016-1.db < school-data-2016-1.sql Note that a few, non-consequential, errors will be reported if you run this command yourself. The reason for the errors is that the SQLite database is created by importing a series of '.csv' files. Each of the .csv files contains a header line with the names of the variable relevant to each column. The information is useful for many statistical packages but it is not what SQLite expects, so it complains about the header. Despite the complaint, the database will be created correctly.

Briefly, the data are organized as follows. (a) The .csv files ('comma separated values') do not actually use a comma as the field delimiter. Instead, the vertical bar character '|' (ASCII Octal 174 Decimal 124 Hex 7C) is used. If you read the .csv files using Microsoft Excel, Open Office, or Libre Office, you will need to set the field-separator to be '|'. Check your software documentation to understand how to do this. (b) Each school-related record is indexed by an identifer called 'ageid'. The ageid uniquely identifies each school and consequently serves as the appropriate variable for JOIN-ing records in different data files. For example, the first school-related record after the header line in file 'students-headed-bar.csv' shows the ageid of the school as 40000. The relevant school name can be found by looking in the file 'ageidtoname-headed-bar.csv' to discover that the the ageid of 40000 corresponds to a school called 'Corpus Christi Catholic School'. (3) In addition to the variable 'ageid' each record is also identified by one or two 'year' variables. The most important purpose of a year identifier will be to indicate the year that is relevant to the record. For example, if one turn again to file 'students-headed-bar.csv', one sees that the first seven school-related records after the header line all relate to the school Corpus Christi Catholic School with ageid of 40000. The variable that identifies the important differences between these seven records is the variable 'studentyear'. 'studentyear' shows the year to which the student data refer. One can see, for example, that in 2008, there were a total of 410 students enrolled, of whom 185 were girls and 225 were boys (look at the variable names in the header line). (4) The variables relating to years are given different names in each of the different files ('studentsyear' in the file 'students-headed-bar.csv', 'financesummaryyear' in the file 'financesummary-headed-bar.csv'). Despite the different names, the year variables provide the second-level means for joining information acrosss files. For example, if you wanted to relate the enrolments at a school in each year to its financial state, you might wish to JOIN records using 'ageid' in the two files and, secondarily, matching 'studentsyear' with 'financialsummaryyear'. (5) The manipulation of the data is most readily done using the SQL language with the SQLite database but it can also be done in a variety of statistical packages. (6) It is our intention for Edition 2016-2 to create large 'flat' files suitable for use by non-researchers who want to view the data with spreadsheet software. The disadvantage of such 'flat' files is that they contain vast amounts of redundant information and might not display the data in the form that the user most wants it. (7) Geocoding of the schools is not available in this edition. (8) Some files, such as 'sector-headed-bar.csv' are not used in the creation of the database but are provided as a convenience for researchers who might wish to recode some of the data to remove redundancy. (9) A detailed example of a suitable SQLite query can be found in the file 'school-data-sqlite-example.sql'. The same query, used in the context of analyses done with the excellent, freely available R statistical package (http://www.r-project.org) can be seen in the file 'school-data-with-sqlite.R'.
n
GRACE-A and GRACE-B Level 1B, Level 1B combined and Level 2 Data Products
cmr.earthdata.nasa.gov
fedeo.ceos.org
not provided
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). GRACE-A and GRACE-B Level 1B, Level 1B combined and Level 2 Data Products [Dataset]. https://cmr.earthdata.nasa.gov/search/concepts/C1965336912-ESA.html
Explore at:
not providedAvailable download formats
Dataset updated
Apr 24, 2025
Time period covered
Apr 1, 2002 - Oct 27, 2017
Area covered
Earth
Description
Level-1A Data Products are the result of a non-destructive processing applied to the Level-0 data at NASA/JPL. The sensor calibration factors are applied in order to convert the binary encoded measurements to engineering units. Where necessary, time tag integer second ambiguity is resolved and data are time tagged to the respective satellite receiver clock time. Editing and quality control flags are added, and the data is reformatted for further processing. The Level-1A data are reversible to Level-0, except for the bad data packets. This level also includes the ancillary data products needed for processing to the next data level. The Level-1B Data Products are the result of a possibly destructive, or irreversible, processing applied to both the Level-1A and Level-0 data at NASA/JPL. The data are correctly time-tagged, and data sample rate is reduced from the higher rates of the previous levels. Collectively, the processing from Level-0 to Level-1B is called the Level-1 Processing. This level also includes the ancillary data products generated during this processing, and the additional data needed for further processing. The Level-2 data products include the static and time-variable (monthly) gravity field and related data products derived from the application of Level-2 processing at GFZ, UTCSR and JPL to the previous level data products. This level also includes the ancillary data products such as GFZ's Level-1B short-term atmosphere and ocean de-aliasing product (AOD1B) generated during this processing. GRACE-A and GRACE-B Level-1B Data Product • Satellite clock solution [GA-OG-1B-CLKDAT, GB-OG-1B-CLKDAT, GRACE CLKDAT]: Offset of the satellite receiver clock relative to GPS time, obtained by linear fit to raw on-board clock offset estimates. • GPS flight data [GA-OG-1B-GPSDAT, GB-OG-1B-GPSDAT, GRACE GPSDAT]: Preprocessed and calibrated GPS code and phase tracking data edited and decimated from instrument high-rate (10 s (code) or 1 s (phase)) to low-rate (10 s) samples for science use (1 file per day, level-1 format) • Accelerometer Housekeeping data [GA-OG-1B-ACCHKP, GB-OG-1B-ACCHKP, GRACE ACCHKP]: Accelerometer proof-mass bias voltages, capacitive sensor outputs, instrument control unit (ICU) and sensor unit (SU) temperatures, reference voltages, primary and secondary power supply voltages (1 file per day, level-1 format). • Accelerometer data [GA-OG-1B-ACCDAT, GB-OG-1B-ACCDAT, GRACE ACCDAT]: Preprocessed and calibrated Level-1B accelerometer data edited and decimated from instrument high-rate (0.1 s) to low-rate (1s) samples for science use (1 file per day, level-1 format). • Intermediate clock solution [GA-OG-1B-INTCLK, GB-OG-1B-INTCLK, GRACE INTCLK]: derived with GIPSY POD software (300 s sample rate) (1 file per day, GIPSY format) • Instrument processing unit (IPU) Housekeeping data [GA-OG-1B-IPUHKP, GB-OG-1B-IPUHKP, GRACE IPUHKP]: edited and decimated from high-rate (TBD s) to low-rate (TBD s) samples for science use (1 file per day, level-1 format) • Spacecraft Mass Housekeeping data [GA-OG-1B-MASDAT, GB-OG-1B-MASDAT, GRACE MASDAT]: Level 1B Data as a function of time • GPS navigation solution data [GA-OG-1B-NAVSOL, GB-OG-1B-NAVSOL, GRACE NAVSOL]: edited and decimated from instrument high-rate (60 s) to low-rate (30 s) samples for science use (1 file per day, level-1 format) • OBDH time mapping to GPS time Housekeeping data [GA-OG-1B-OBDHTM, GB-OG-1B-OBDHTM, GRACE OBDHTM]: On-board data handling (OBDH) time mapping data (OBDH time to receiver time • Star camera data [GA-OG-1B-SCAATT, GB-OG-1B-SCAATT, GRACE SCAATT]: Preprocessed and calibrated star camera quaternion data edited and decimated from instrument high-rate (1 s) to low-rate (5 s) samples for science use (1 file per day, level-1 format) • Thruster activation Housekeeping data [GA-OG-1B-THRDAT, GB-OG-1B-THRDAT, GRACE THRDAT]: GN2 thruster data used for attitude (10 mN) and orbit (40 mN) control • GN2 tank temperature and pressure Housekeeping data [GA-OG-1B-TNKDAT, GB-OG-1B-TNKDAT, GRACE TNKDAT]: GN2 tank temperature and pressure data • Oscillator frequency data [GA-OG-1B-USODAT, GB-OG-1B-USODAT, GRACE USODAT]: derived from POD productGRACE-A and GRACE-B Combined Level-1B Data Product • Preprocessed and calibrated k-band ranging data [GA-OG-1B-KBRDAT, GB-OG-1B-KBRDAT, GRACE KBRDAT]: range, range-rate and range-acceleration data edited and decimated from instrument high-rate (0.1 s) to low-rate (5 s) samples for science use (1 file per day, level-1 format) • Atmosphere and Ocean De-aliasing Product [GA-OG-1B-ATMOCN, GB-OG-1B-ATMOCN, GRACE ATMOCN]: GRACE Atmosphere and Ocean De-aliasing Product GRACE Level-2 Data Product • GAC [GA-OG-_2-GAC, GB-OG-_2-GAC, GRACE GAC]: Combination of non-tidal atmosphere and ocean spherical harmonic coefficients provided as average over certain time span (same as corresponding GSM product) based on level-1 AOD1B product (1file per time span, level-2 format) • GCM [GA-OG-_2-GCM, GB-OG-_2-GCM, GRACE GCM]: Spherical harmonic coefficients and standard deviations of the long-term static gravity field estimated by combination of GRACE satellite instrument data and other information for a dedicated time span (multiple years) and spatial resolution (1 file per time span, level-2 format) • GAB [GA-OG-_2-GAB, GB-OG-_2-GAB, GRACE GAB]: Non-tidal ocean spherical harmonic coefficients provided as average over certain time span (same as corresponding GSM product) based on level-1 AOD1B product (1file per time span, level-2 format) • GAD [GA-OG-_2-GAD, GB-OG-_2-GAD, GRACE GAD]: bottom pressure product - combination of surface pressure and ocean (over the oceans, and zero over land). Spherical harmonic coefficients provided as average over certain time span (same as corresponding GSM product) based on level-1 AOD1B product (1file per time span, level-2 format) • GSM [GA-OG-_2-GSM, GB-OG-_2-GSM, GRACE GSM]: Spherical harmonic coefficients and standard deviations of the static gravity field estimated from GRACE satellite instrument data only for a dedicated time span (e.g. weekly, monthly, multiple years) and spatial resolution (1 file per time span, level-2 format).
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
European Investment Bankhttp://eib.org/
World Bankhttp://worldbank.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
d
Example Groundwater-Level Datasets and Benchmarking Results for the...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package [Dataset]. https://catalog.data.gov/dataset/example-groundwater-level-datasets-and-benchmarking-results-for-the-automated-regional-cor
Explore at:
Dataset updated
Oct 13, 2024
Dataset provided by
U.S. Geological Survey
Description
This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.
d
Data for multiple linear regression models for predicting microcystin...
catalog.data.gov
data.usgs.gov
+3more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-predicting-microcystin-concentration-action
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Ohio
Description
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models were useful at some sites with lagged variables (< 2 weeks) for cyanobacterial toxin genes, dissolved nutrients, and (or) N to P ratios. Comprehensive models are presented in three categories: (1) three models with continuous monitor data and lagged comprehensive variables, (2) five models with no continuous monitor data and lagged comprehensive variables, and (3) one model with continuous monitor data and same-day comprehensive variables. Funding for this work was provided by the Ohio Water Development Authority and the U.S. Geological Survey Cooperative Water Program.
i
Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...
datacatalog.ihsn.org
catalog.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://datacatalog.ihsn.org/catalog/7662
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
The Hashemite Kingdom of Jordan Department of Statistics (DOS)
Time period covered
2010 - 2011
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Households

Individuals

Kind of data

Sample survey data [ssd]

Sampling procedure

The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

Mode of data collection

Face-to-face [f2f]

Research instrument

General form

Expenditure on food commodities form

Expenditure on non-food commodities form

Cleaning operations

Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
n
AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/airnow-air-quality-monitoring-data-current
Explore at:
Dataset updated
Feb 28, 2024
Description
This United States Environmental Protection Agency (US EPA) feature layer represents monitoring site data, updated hourly concentrations and Air Quality Index (AQI) values for the latest hour received from monitoring sites that report to AirNow.Map and forecast data are collected using federal reference or equivalent monitoring techniques or techniques approved by the state, local or tribal monitoring agencies. To maintain "real-time" maps, the data are displayed after the end of each hour. Although preliminary data quality assessments are performed, the data in AirNow are not fully verified and validated through the quality assurance procedures monitoring organizations used to officially submit and certify data on the EPA Air Quality System (AQS).This data sharing, and centralization creates a one-stop source for real-time and forecast air quality data. The benefits include quality control, national reporting consistency, access to automated mapping methods, and data distribution to the public and other data systems. The U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies developed the AirNow system to provide the public with easy access to national air quality information. State and local agencies report the Air Quality Index (AQI) for cities across the US and parts of Canada and Mexico. AirNow data are used only to report the AQI, not to formulate or support regulation, guidance or any other EPA decision or position.About the AQIThe Air Quality Index (AQI) is an index for reporting daily air quality. It tells you how clean or polluted your air is, and what associated health effects might be a concern for you. The AQI focuses on health effects you may experience within a few hours or days after breathing polluted air. EPA calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. For each of these pollutants, EPA has established national air quality standards to protect public health. Ground-level ozone and airborne particles (often referred to as "particulate matter") are the two pollutants that pose the greatest threat to human health in this country.A number of factors influence ozone formation, including emissions from cars, trucks, buses, power plants, and industries, along with weather conditions. Weather is especially favorable for ozone formation when it’s hot, dry and sunny, and winds are calm and light. Federal and state regulations, including regulations for power plants, vehicles and fuels, are helping reduce ozone pollution nationwide.Fine particle pollution (or "particulate matter") can be emitted directly from cars, trucks, buses, power plants and industries, along with wildfires and woodstoves. But it also forms from chemical reactions of other pollutants in the air. Particle pollution can be high at different times of year, depending on where you live. In some areas, for example, colder winters can lead to increased particle pollution emissions from woodstove use, and stagnant weather conditions with calm and light winds can trap PM2.5 pollution near emission sources. Federal and state rules are helping reduce fine particle pollution, including clean diesel rules for vehicles and fuels, and rules to reduce pollution from power plants, industries, locomotives, and marine vessels, among others.How Does the AQI Work?Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.Understanding the AQIThe purpose of the AQI is to help you understand what local air quality means to your health. To make it easier to understand, the AQI is divided into six categories:Air Quality Index(AQI) ValuesLevels of Health ConcernColorsWhen the AQI is in this range:..air quality conditions are:...as symbolized by this color:0 to 50GoodGreen51 to 100ModerateYellow101 to 150Unhealthy for Sensitive GroupsOrange151 to 200UnhealthyRed201 to 300Very UnhealthyPurple301 to 500HazardousMaroonNote: Values above 500 are considered Beyond the AQI. Follow recommendations for the Hazardous category. Additional information on reducing exposure to extremely high levels of particle pollution is available here.Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:"Good" AQI is 0 to 50. Air quality is considered satisfactory, and air pollution poses little or no risk."Moderate" AQI is 51 to 100. Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms."Unhealthy for Sensitive Groups" AQI is 101 to 150. Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air."Unhealthy" AQI is 151 to 200. Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects."Very Unhealthy" AQI is 201 to 300. This would trigger a health alert signifying that everyone may experience more serious health effects."Hazardous" AQI greater than 300. This would trigger a health warnings of emergency conditions. The entire population is more likely to be affected.AQI colorsEPA has assigned a specific color to each AQI category to make it easier for people to understand quickly whether air pollution is reaching unhealthy levels in their communities. For example, the color orange means that conditions are "unhealthy for sensitive groups," while red means that conditions may be "unhealthy for everyone," and so on.Air Quality Index Levels of Health ConcernNumericalValueMeaningGood0 to 50Air quality is considered satisfactory, and air pollution poses little or no risk.Moderate51 to 100Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups101 to 150Members of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy151 to 200Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy201 to 300Health alert: everyone may experience more serious health effects.Hazardous301 to 500Health warnings of emergency conditions. The entire population is more likely to be affected.Note: Values above 500 are considered Beyond the AQI. Follow recommendations for the "Hazardous category." Additional information on reducing exposure to extremely high levels of particle pollution is available here.
a
Groundwater Level Data: All Historic Data
hub.arcgis.com
data-idwr.hub.arcgis.com
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Idaho Department of Water Resources (2022). Groundwater Level Data: All Historic Data [Dataset]. https://hub.arcgis.com/documents/f1a190a2077c4b7da87b9cc19d0a316e
Explore at:
Dataset updated
Jul 18, 2022
Dataset authored and provided by
Idaho Department of Water Resources
Description
IDWR maintains a groundwater level database containing data primarily collected by IDWR, but also includes data gathered by the USGS, USBR, and other public and private entities. Please reach out to these other entities to obtain their full complete record, as not all values are present in this database (IDWR can provide a full list of data contributors upon request). IDWR staff manually measure the "depth to water" in wells throughout Idaho. Pressure transducers in many wells provide near-continuous water level measurements. IDWR strives to create complete and accurate data and may revise these data when indicated.

“Groundwater Level Data: All Historic Data” includes all well data managed in IDWR’s internal database, regardless of current well status. For example, historic data from discontinued, abandoned, or inactive wells are contained in this dataset. IDWR’s water level data are also hosted in the Groundwater Data Portal (https://idwr-groundwater-data.idaho.gov/), which displays only actively monitored wells.

The three files included in this download are 1) discrete (manual) depth to water measurements 2) continuous* (pressure transducer) depth to water measurements, and 3) the associated well metadata.

*The continuous measurements data have been condensed to display only the shallowest daily pressure transducer measurements. Complete datasets are available upon request.
ERA5 hourly data on single levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.adbb2d47
Dataset updated
Aug 9, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
Time period covered
Jan 1, 1940 - Aug 3, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
OCO-3 Level 1A collated, parsed, science or calibration data V11...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/GSFC/SED/ESD/GCDC/GESDISC (2025). OCO-3 Level 1A collated, parsed, science or calibration data V11 (OCO3_L1aIn_Sample) at GES DISC [Dataset]. https://catalog.data.gov/dataset/oco-3-level-1a-collated-parsed-science-or-calibration-data-v11-oco3-l1ain-sample-at-ges-di
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Version 11 is the current version of the data set. Older versions will no longer be available and are superseded by Version 11. The Orbiting Carbon Observatory is the first NASA mission designed to collect space-based measurements of atmospheric carbon dioxide with the precision, resolution, and coverage needed to characterize the processes controlling its buildup in the atmosphere. The OCO-3 project uses the LEOStar-2 spacecraft that carries a single instrument. It incorporates three high-resolution spectrometers that make coincident measurements of reflected sunlight in the near-infrared CO2 near 1.61 and 2.06 micrometers and in molecular oxygen (O2) A-Band at 0.76 micrometers. The three spectrometers have different characteristics and are calibrated independently. Their raw data numbers (DN) are delivered correlated in time to the Level 1B process as Level 1A products. Each band has 1016 spectral elements, although some are masked out in the L2 retrieval.This product is the input to the Level 1B process. It is depacketized raw data formatted into a standard granularity with calibrated engineering data (for both science and calibration observations), in the Sample Mode of operation.
Data from: MSL MARS SAMPLE ANALYSIS AT MARS 5 RDR LEVEL 2 V1.0
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). MSL MARS SAMPLE ANALYSIS AT MARS 5 RDR LEVEL 2 V1.0 [Dataset]. https://catalog.data.gov/dataset/msl-mars-sample-analysis-at-mars-5-rdr-level-2-v1-0-0398a
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This archive contains Reduced Data Record (RDR) instrument data acquired by the Sample Analysis at Mars (SAM) instrument suite on the Mars Science Laboratory rover. SAM includes the Quadrupole Mass Spectrometer (QMS), the Gas Chromatograph (GC), and the Tunable Laser Spectrometer (TLS).
d
Data from: South Florida Holocene coral sea-level database for samples...
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). South Florida Holocene coral sea-level database for samples collected from 1977 to 2017 [Dataset]. https://catalog.data.gov/dataset/south-florida-holocene-coral-sea-level-database-for-samples-collected-from-1977-to-2017
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Florida
Description
Holocene-aged coral samples from the south Florida region were extensively characterized to create a new database of verified sea-level data. The samples were originally collected using coral-reef coring or other geologic sampling methods and were obtained by various researchers from published studies spanning the interval of 1977 to 2017. Many of these samples are presently stored in the U.S. Geological Survey (USGS) Core Archive at the St. Petersburg Coastal and Marine Science Center in St. Petersburg, Florida (https://doi.org/10.5066/F7319TR3). USGS staff compiled all relevant information from publications pertaining to each sample and then, if the samples were available in the USGS Core Archive, researchers performed visual analyses to characterize the taphonomic condition and to determine if samples were in situ (growth position). Samples were then assigned a rank (from 0-3, with 0 being the highest rank) to indicate their quality and reliability for use as sea-level indicators based on a combination of the information from the publications and the results of the analyses. This research is a part of the USGS Coral Reef Ecosystem Studies Project (https://coastal.er.usgs.gov/crest/).
OCO-2 Level 1A collated, parsed, science or calibration data V10...
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/GSFC/SED/ESD/GCDC/GESDISC (2025). OCO-2 Level 1A collated, parsed, science or calibration data V10 (OCO2_L1aIn_Sample) at GES DISC [Dataset]. https://catalog.data.gov/dataset/oco-2-level-1a-collated-parsed-science-or-calibration-data-v10-oco2-l1ain-sample-at-ges-di
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Version 10 is the current version of the data set. Older versions will no longer be available and are superseded by Version 10. The Orbiting Carbon Observatory is the first NASA mission designed to collect space-based measurements of atmospheric carbon dioxide with the precision, resolution, and coverage needed to characterize the processes controlling its buildup in the atmosphere. The OCO-2 project uses the LEOStar-2 spacecraft that carries a single instrument. It incorporates three high-resolution spectrometers that make coincident measurements of reflected sunlight in the near-infrared CO2 near 1.61 and 2.06 micrometers and in molecular oxygen (O2) A-Band at 0.76 micrometers. The three spectrometers have different characteristics and are calibrated independently. Their raw data numbers (DN) are delivered correlated in time to the Level 1B process as Level 1A products. Each band has 1016 spectral elements, although some are masked out in the L2 retrieval.This product is the input to the Level 1B process. It is depacketized raw data formatted into a standard granularity with calibrated engineering data (for both science and calibration observations), in the Sample Mode of operation.
Trips by Distance
s.cnmilf.com
catalog.data.gov
Updated Feb 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Transportation Statistics (2023). Trips by Distance [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/trips-by-distance
Explore at:
Dataset updated
Feb 1, 2023
Dataset provided by
Bureau of Transportation Statisticshttp://www.rita.dot.gov/bts
Description
Updates are delayed due to technical difficulties. How many people are staying at home? How far are people traveling when they don’t stay home? Which states and counties have more people taking trips? The Bureau of Transportation Statistics (BTS) now provides answers to those questions through our new mobility statistics. The Trips by Distance data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland. The travel statistics are produced from an anonymized national panel of mobile device data from multiple sources. All data sources used in the creation of the metrics contain no personal information. Data analysis is conducted at the aggregate national, state, and county levels. A weighting procedure expands the sample of millions of mobile devices, so the results are representative of the entire population in a nation, state, or county. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day. Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized _location away from home. Home locations are imputed on a weekly basis. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips. Trips capture travel by all modes of transportation. including driving, rail, transit, and air. The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized _location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized _location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed. These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.
T
Daily Mobility Statistics
data.bts.gov
odgavaprod.ogopendata.com
application/rdfxml +5
Updated Apr 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland (2024). Daily Mobility Statistics [Dataset]. https://data.bts.gov/widgets/w96p-f2qv
Explore at:
csv, tsv, json, application/rdfxml, xml, application/rssxmlAvailable download formats
Dataset updated
Apr 30, 2024
Dataset authored and provided by
Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
The Daily Mobility Statistics were derived from a data panel constructed from several mobile data providers, a step taken to address the reduce the risks of geographic and temporal sample bias that would result from using a single data source. In turn, the merged data panel only included data from those mobile devices whose anonymized location data met a set of data quality standards, e.g., temporal frequency and spatial accuracy of anonymized location point observations, device-level temporal coverage and representativeness, spatial distribution of data at the sample and county levels. After this filtering, final mobility estimate statistics were computed using a multi-level weighting method that employed both device- and trip-level weights, thus expanding the sample represented by the devices in the data panel to the at-large populations of each state and county in the US.

Data analysis was conducted at the aggregate national, state, and county levels. To assure confidentiality and support data quality, no data were reported for a county if it had fewer than 50 devices in the sample on any given day.

Trips were defined as movements that included a stay of longer than 10 minutes at an anonymized location away from home. A movement with multiple stays of longer than 10 minutes--before returning home--was counted as multiple trips.

The Daily Mobility Statistics data on this page, which cover the COVID and Post-COVID periods, are experimental. Experimental data products are created using novel or exploratory data sources or methodologies that benefit data users in the absence of other statistically rigorous products, and they not meet all BTS data quality standards.
NOAA Aircraft Operations Center (AOC) Flight Level Data
catalog.data.gov
data.cnra.ca.gov
+4more
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). NOAA Aircraft Operations Center (AOC) Flight Level Data [Dataset]. https://catalog.data.gov/dataset/noaa-aircraft-operations-center-aoc-flight-level-data1
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
NOAA AOC WP-3D Research Flight Data is digital data set DSI-6420, archived at the National Climatic Data Center (NCDC). This data set is meteorological data gathered by Lockheed WP-3D Orion aircraft, operated by the NOAA Aircraft Operations Center (AOC) at MacDill AFB, Florida. Data is provided by the Science and Engineering Division of AOC to the National Climatic Data Center (NCDC). The WP-3D aircraft perform many projects throughout the year. Examples of these projects would be hurricane research, atmospheric chemistry, thunderstorm investigations, and winter weather missions. Each of these projects consists of a series of individual flights. For instance, during hurricane projects, the P-3 may fly numerous flights through different tropical cyclones. For each archived project, there are multiple directories consisting of individual flights. The data in these flight directories contain the actual raw meteorological parameters obtained from sensors located in different positions on the aircraft. The data is initially written to a digital data tape on the aircraft and then later converted to a file for faster processing and archiving. Each flight folder also contains a scanned image of the actual flight manifest, the navigation log, and the mission observation logs. The flight-level data file contains measurements acquired in one second intervals. The following is a generalized list of these measured parameters: Time, GPS position data, inertial data, radar altimeter measurements, liquid water, total temperature, dewpoint temperature, attack pressure, slip pressure, differential attack and slip pressures, and static and dynamic pressure. Depending on the needs of each individual project, other sources of data will be added or subtracted from this list. As of publication this record consists of 5 projects: 1) NOAA-42 Aircraft-N42RF during the 2003 Hurricane season, the Tamdar project, and a wind calibration flight. 2) NOAA-43 Aircraft-N43RF-2003 Sar Pod, Hurricane, and Extratropical Season as well as the SFMR test flight missions. 3) NOAA-49 Aircraft-N49RF 2004 Winter Storms Experiment. 4) NOAA-43 Name Experiment 2004. 5) NOAA-49 Aircraft N49RF Hurricane Season 2004.
i
Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...
catalog.ihsn.org
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Statistics (2022). Household Expenditure and Income Survey 2008, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7661
Explore at:
Dataset updated
Jan 12, 2022
Dataset authored and provided by
Department of Statistics
Time period covered
2008 - 2009
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Household/families

Individuals

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.

To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.

It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.

Mode of data collection

Face-to-face [f2f]

Research instrument

List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form

Cleaning operations

Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results

Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format
z
Missing data in the analysis of multilevel and dependent data (Example data...
zenodo.org
bin
Updated Jul 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Example data sets) [Dataset]. http://doi.org/10.5281/zenodo.7773614
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7773614
Dataset updated
Jul 20, 2023
Dataset provided by
Springer
Authors
Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data sets for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the data sets used in both example analyses (Examples 1 and 2) in two file formats (binary ".rda" for use in R; plain-text ".dat").

The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

ID = group identifier (1-2000)
x = numeric (Level 1)
y = numeric (Level 1)
w = binary (Level 2)

In all data sets, missing values are coded as "NA".
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
opendatalab.com
+5more
application/rdfxml +5
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). 2010 County and City-Level Water-Use Data and Associated Explanatory Variables [Dataset]. https://catalog.data.gov/dataset/2010-county-and-city-level-water-use-data-and-associated-explanatory-variables

2010 County and City-Level Water-Use Data and Associated Explanatory Variables

Explore at:

Dataset updated

Jul 6, 2024

Dataset provided by

U.S. Geological Survey

Description

This data release contains the input-data files and R scripts associated with the analysis presented in [citation of manuscript]. The spatial extent of the data is the contiguous U.S. The input-data files include one comma separated value (csv) file of county-level data, and one csv file of city-level data. The county-level csv (“county_data.csv”) contains data for 3,109 counties. This data includes two measures of water use, descriptive information about each county, three grouping variables (climate region, urban class, and economic dependency), and contains 18 explanatory variables: proportion of population growth from 2000-2010, fraction of withdrawals from surface water, average daily water yield, mean annual maximum temperature from 1970-2010, 2005-2010 maximum temperature departure from the 40-year maximum, mean annual precipitation from 1970-2010, 2005-2010 mean precipitation departure from the 40-year mean, Gini income disparity index, percent of county population with at least some college education, Cook Partisan Voting Index, housing density, median household income, average number of people per household, median age of structures, percent of renters, percent of single family homes, percent apartments, and a numeric version of urban class. The city-level csv (city_data.csv) contains data for 83 cities. This data includes descriptive information for each city, water-use measures, one grouping variable (climate region), and 6 explanatory variables: type of water bill (increasing block rate, decreasing block rate, or uniform), average price of water bill, number of requirement-oriented water conservation policies, number of rebate-oriented water conservation policies, aridity index, and regional price parity. The R scripts construct fixed-effects and Bayesian Hierarchical regression models. The primary difference between these models relates to how they handle possible clustering in the observations that define unique water-use settings. Fixed-effects models address possible clustering in one of two ways. In a "fully pooled" fixed-effects model, any clustering by group is ignored, and a single, fixed estimate of the coefficient for each covariate is developed using all of the observations. Conversely, in an unpooled fixed-effects model, separate coefficient estimates are developed only using the observations in each group. A hierarchical model provides a compromise between these two extremes. Hierarchical models extend single-level regression to data with a nested structure, whereby the model parameters vary at different levels in the model, including a lower level that describes the actual data and an upper level that influences the values taken by parameters in the lower level. The county-level models were compared using the Watanabe-Akaike information criterion (WAIC) which is derived from the log pointwise predictive density of the models and can be shown to approximate out-of-sample predictive performance. All script files are intended to be used with R statistical software (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org) and Stan probabilistic modeling software (Stan Development Team. 2017. RStan: the R interface to Stan. R package version 2.16.2. http://mc-stan.org).

Clear search

Close search

Google apps

Main menu

2010 County and City-Level Water-Use Data and Associated Explanatory...

Data from: Open-data release of aggregated Australian school-level...

GRACE-A and GRACE-B Level 1B, Level 1B combined and Level 2 Data Products

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Example Groundwater-Level Datasets and Benchmarking Results for the...

Data for multiple linear regression models for predicting microcystin...

Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

AirNow Air Quality Monitoring Data (Current) - Dataset - CKAN

Groundwater Level Data: All Historic Data

ERA5 hourly data on single levels from 1940 to present

OCO-3 Level 1A collated, parsed, science or calibration data V11...

Data from: MSL MARS SAMPLE ANALYSIS AT MARS 5 RDR LEVEL 2 V1.0

Data from: South Florida Holocene coral sea-level database for samples...

OCO-2 Level 1A collated, parsed, science or calibration data V10...

Trips by Distance

Daily Mobility Statistics

NOAA Aircraft Operations Center (AOC) Flight Level Data

Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Missing data in the analysis of multilevel and dependent data (Example data...

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

2010 County and City-Level Water-Use Data and Associated Explanatory VariablesSee More Versions

2010 County and City-Level Water-Use Data and Associated Explanatory Variables