Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as self-evaluations of subjective well-being and self-assessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This article introduces a novel estimator, the ordered correlation forest, that can naturally handle non linearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an “honesty” condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates’ marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forest-based estimators and demonstrates its ability to construct valid confidence intervals for the covariates’ marginal effects. Comparisons using various real-world data sets further highlight the advantages of forest-based estimators over parametric models in larger samples while showing that the ordered correlation forest remains competitive in smaller samples.
Facebook
Twitter
Facebook
TwitterNew York City Department of Education 2014 - 2017 Regents
Testing and score data includes all administrations of the Regents exam: January, June, and August. It reports the highest score for each student for each Regents exam for each school year. Non-numeric marks are dropped from the data.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterThe Quarterly Labour Force Survey (QLFS) is a household-based sample survey conducted by Statistics South Africa (Stats SA). It collects data on the labour market activities of individuals aged 15 years and above who live in South Africa.
National Coverage
Individuals, households
The QLFS sample covers the non-institutional population except for workers' hostels. However, persons living in private dwelling units within institutions are also enumerated. For example, within a school compound, one would enumerate the schoolmaster's house and teachers' accommodation because these are private dwellings. Students living in a dormitory on the school compound would, however, be excluded.
Sample survey data [ssd]
The QLFS sample covers the non-institutional population except for workers' hostels. However, persons living in private dwelling units within institutions are also enumerated. For example, within a school compound, you would enumerate the schoolmaster's house and teachers' accommodation because these are private dwellings. Students living in a dormitory on the school compound would therefore be excluded.
Survey requirements and design :
The Labour Force Survey frame has been developed as a general purpose household survey frame that can be used by all other household surveys irrespective of the sample size requirement of the survey. The sample size for the QLFS is roughly 30 000 dwellings and these are divided equally into four rotation groups, i.e. 7 500 dwellings per rotation group. The sample is based on information collected during the 2001 Population Census conducted by Stats SA. In preparation for the 2001 census, the country was divided into 80 787 enumeration areas (EAs). Some of these EAs are small in terms of the number of households that were enumerated in them at the time of Census 2001. Stats SA's household-based surveys use a Master Sample which comprises of EAs that are drawn from across the country. For the purposes of the Master Sample the EAs that contained less than 25 households were excluded from the sampling frame, and those that contained between 25 and 99 households were combined with other EAs to form Primary Sampling Units (PSUs). The number of EAs per PSU ranges between one and four. On the other hand, very large EAs represent two or more PSUs. The sample is designed to be representative at the provincial level and within provinces at the metro/non-metro level. Within the metros, the sample is further distributed by geography type. The four geography types are: urban formal, urban informal, farms and tribal. This implies that for example, that within a metropolitan area the sample is designed to be representative at the different geography types that may exist within that metro. The current sample size is 3 080 PSUs. It is equally divided into four sub-groups or panels called rotation groups. The rotation groups are designed in such a way that each of these groups has the same distribution pattern as that which is observed in the whole sample. They are numbered from one to four and these numbers also correspond to the quarters of the year in which the sample will be rotated for the particular group. The sample for the redesigned Labour Force Survey is based on a stratified two-stage design with probability proportional to size (PPS) sampling of primary sampling units (PSUs) in the first stage, and sampling of dwelling units (DUs) with systematic sampling in the second stage.
Sample rotation :
The sampled PSUs have been assigned to 4 rotation groups, and dwellings selected from the PSUs assigned to rotation group "1" are rotated in the first quarter. Similarly, the dwellings selected from the PSUs assigned to rotation group "2" are rotated in the second quarter, and so on. Thus, each sampled dwelling will remain in the sample for four consecutive quarters. It should be noted that the sampling unit is the dwelling, and the unit of observation is the household. Therefore, if a household moves out of a dwelling after being in the sample for, say 2 quarters and a new household moves in then the new household will be enumerated for the next two quarters. If no household moves into the sampled dwelling, the dwelling will be classified as vacant (unoccupied). Each quarter, ¼ of the sampled dwellings rotate out of the sample and are replaced by new dwellings from the same PSU or the next PSU on the list. A total of 3 080 PSUs were selected for the redesigned LFS, and 770 have been assigned to each of the four rotation groups.
Face-to-face [f2f]
The questionnaire consists of the following sections:
Section 1 - Biographical information (marital status, language, migration, education,training, literacy, etc. Section 2 - Economic activities Section 3 - Unemployment and economic inactivity Section 4 - Main work activities in the last week Section 5 - Earnings in the main job All sections - Comprehensive coverage of all aspects of the labour market
Data Processing
Introduction : The purpose of data processing is to ensure that the information collected from the sampled primary sampling units, dwelling units and households (i.e. the boxes containing QLFS questionnaires) are physically received, stored and processed. The aim is to produce a clean dataset that has all the information contained in the questionnaires. Except for the scanning system, all other elements of the data processing system were developed in-house. One important innovation that is central to the smooth operation of the entire system is the development of barcodes that are linked to a unique number on each questionnaire. This information provides the link between the information recorded in the Master Sample database and other processes such as editing and imputation as well as weighting and variance estimation.
Processing phases : QLFS data processing is continuous, starting on the second week of every month. Data processing for each quarter must be completed by the first Friday of the subsequent month to ensure that the four-week deadline for publication of the QLFS results is met.
The phases listed below occur sequentially.
Receiving of questionnaires : The contents of the boxes containing questionnaires sent from the regional offices are verified when received at the DPC. The questionnaire barcodes captured in the provinces are captured again at the DPC to ensure that all questionnaires have been received.
Primary preparation : The purpose of primary preparation is to ensure that all questionnaires are correctly stacked and positioned prior to being guillotined.
Guillotining: The purpose of the guillotine process is to cut off the spines of the questionnaires in order to have pages separated for scanning.
Secondary preparation : The purpose of secondary preparation is to ensure that the questionnaires are correctly stacked and positioned for scanning. At the same time, quality assurance takes place on the work done during the primary preparation and guillotining processes.
Scanning : The purpose of scanning and recognition is to convert the questionnaires into an electronic format and Tagged Image File Format (TIFF) images.
Verification : The purpose of scanning verification is to manually correct un-interpretable characters, missing data and errors detected by validation rules.
Electronic coding: Industry and occupation codes are assigned using the electronic coding system which converts the respondents' industry and occupation descriptions into numeric codes based on Standard Industry Classification (SIC) and South African Standard Occupation Classification (SASCO). If the system fails to assign a code for either industry or occupation, the coding is assigned manually.
Automated editing and imputation : QLFS uses the editing and imputation module to ensure that output data is both clean and complete10. There are three basic components, called functions, in the Edit and Imputation Module:
Function A: Record acceptance Function B: Edit and imputation Function C: Clean up, derived variables and preparation for weighting Function A: Record acceptance
This function is divided into three phases:
First phase: Pre-function A : The first phase ensures that the records contain valid information in selected Cover Page questions required during edit and imputation and during the subsequent weighting and variance estimation. Any blanks or other errors that need to be corrected are done here before processing of the record can proceed.
Second phase: Function A record acceptance : The second phase ensures that there is enough demographic and labour market activity information to ensure that editing and imputation can be successfully completed.
Third phase: Post Function A clean up : This phase ensures that certain data are present where there is evidence that they should be. This for example, involves: • Ensuring that if there is written material in the job description questions then there are corresponding industry and occupation codes for them. • Ensuring that partial blanks or non-numeric characters that appear in questions where the Survey Officer is required to enter numbers are validated. • Ensuring that where there is written material in the space provided for "Other - specify" that the corresponding option is marked.
Function B: Edit and imputation : Having determined in Function A that the content of the record would support extensive editing and imputation, this function carries out those activities. Editing is the
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Haryana: Rural: Non Food: Consumer Services data was reported at 1,422.000 Unit in 2012. This records a decrease from the previous number of 1,431.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Haryana: Rural: Non Food: Consumer Services data is updated yearly, averaging 1,431.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 1,668.000 Unit in 2005 and a record low of 1,422.000 Unit in 2012. HCE: Number of Sample Households Reporting Consumption: Haryana: Rural: Non Food: Consumer Services data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB040: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Haryana: Rural (Discontinued).
Facebook
TwitterThe dataset details 2021 Budget Recommendations, which is the line-item budget document proposed by the Mayor to the City Council for approval. Budgeted expenditures are identified by department, appropriation account, and funding type: Local, Community Development Block Grant Program (CDBG), and other Grants. “Local” funds refer to those line items that are balanced with locally-generated revenue sources, including but not limited to the Corporate Fund, Water Fund, Midway and O’Hare Airport funds, Vehicle Tax Fund, Library Fund and General Obligation Bond funds.
This dataset follows the format of the equivalent datasets from past years except that Appropriation Authority and Appropriation Account have changed from Number to Text in order to accommodate non-numeric values.
For more information about the budget process, visit the Budget Documents page: http://j.mp/lPotWf.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
http://dx.doi.org/10.6084/m9.figshare.1615873 Data sets and supplementary information for Sepulveda et al. The shifting climate portfolio of the Greater Yellowstone Area. All units are metric (mm, degrees C, cubic meters). Missing values are marked with "nan," which stands for "not a number." Missing entries are due either to completely missing data or because time periods had insufficent data for an accurate calculation. See Sepulveda et al. for details. SNOTEL temperature data were identical to NRCS sources at the time of writing, but NRCS plans recalculate their historical datasets in the near future. This may result in some differences between the data provided here and the data that is in future available from the NRCS web sites. Files with "normalized data" were normalized as follows: 1. For each stream gage or SNOTEL, separately calculate the mean and standard deviation for each parameter during the period spanning water years 1993 - 1994 through 2012-2013. These are defined as "mean" and "sd." 2. For each stream gage or SNOTEL, separately calculate the z value as z = (x-mean) / sd for each parameter for each water year, where x = the annual value for a parameter at a specific gage / SNOTEL. This results in normalized time series for each gage / SNOTEL that are all on a common scale of measure. 3. To find the zone or "overall" mean for a particular parameter for a particular water year, average all the z-scores available in that year for that parameter. The zonal z values appear in the columns labelled "overall_{parameter} in the normalized files below." 4. Stations or gages that do not have complete data during the reference period 1993 - 1994 through 2012 - 2013 (stations with short or poor quality records) have been excluded from normalization. You will notice that normalized values for these stations have been all replaced with "nan." This is to ensure that the same number of gages / stations were used to calculate the overall zone averages and sd (step 2) for each year. This restriction was relaxed for the SNOTEL monthly Tmax, Tmin, and Precip files presented below because too many SNOTEL stations were eliminated, and upon examination, it was found that each station only had a handful of missing months. We can relax this condition for all files, but exploratory graphs show that it produces jump or discontinuous time series that seem less likely to reveal true patterns. Notice that "peak date" and other non-numeric fields cannot be normalized because they are expressed as calendar references, e.g. 6/01/1958. These columns are all replaced with "nan" in the normalized files, but there are numeric equivalents that have been normalized. For example, stream peak dates are available numerically in the variable "peak_index," which is the numbered day of the water year at which peak occurred. Files: meta.csv - metadata describing the weather stations used.---tx_reduced_stn_set_snotel_monthly_tmin.csv - monthly averages of daily Tmaxfrom TopoWx (Oyler et al) infilled and corrected station data files--- tx_reduced_stn_set_snotel_monthly_tmax.csv - monthly averages of daily Tmax from TopoWx (Oyler et al) infilled and corrected station data files---
Annual_SNOTEL_stats.csv - For each water year at each station in the GYA, the following variables are reported: Peak Snow Water Equivalent (PWE) - millimeters Peak snow day - day on which peak SWE occurred expressed as number of days since the start of the water year (October 1) Peak snow date - day on which peak swe occurred expressed as year / month Winter Length - Number of days with Snow Water Equivalent (SWE) greater than zero
april_1_swe.csv - Snow water equivalent (mm) on April 1 for each year at each station. ---monthly_pwe.csv - For each month during each water year at each station, Peak Snow Water Equivalent (mm). Months are indicated in the header as numbers. For example, gunsight_pass_01 is the PWE for January at the Gunsight Pass SNOTEL station. --Normalized_monthly_pwe_all_stations.csv - normalized monthly peak swe. See above for normalization procedure --snotel_monthly_tmax.csv - monthly averages of daily tmax for snotel stations calculated from NRCS data. NOTE all SNOTEL temperature data are identical to NRCS sources at the time of writing but NRCS plans to recalculate their daily historical values in the near future. Future comparisons to NRCS web sites may reveal some differences. ---snotel_monthly_tmin.csv - monthly averages of daily tmin for snotel stations calculated from NRCS data. --melt_out_dates.csv - Day of complete melt out (zero swe) for each water year, expressed as number of days after October 1 ---stream_summaries_05012014.csv -For each water year at each station, the following variables are presented: - StationName_median_# (e.g. soda_boundary_01)= Median flow value (cubic meters per second) for the numbered month. Months are numbered 1 - 12 - StationName_min_# = Minimum daily flow (Cubic meters per second) for the numbered month. Months are numbered 1 - 12. - date_half_disch = Calendar date at which centroid of flow was reached - index_half_disch = Day of water year (number of days after October 1) on which centroid was reached - height_half_disch = flow rate (cubic meters per second) on the date that centroid of flow was reached -peak_date = Calendar date of peak flow - peak_index = Day of water year (number of days after October 1) on which peak flow occurred -peak_cms = Peak Flow (Cubic meters per second) - min_date, min_index, min_cms = Same as the last 3 above but for minimum flow during each water year - total_vol = total volume (cubic meters) of water for each water year -peak_minus_min = Peak flow rate - min flow rate -moving_25th_percentile_flow = 25th percentile flow (cubic meters per second) for the 10 year period including the water year listed and the previous 9 years. -days_below_moving_25th = # days in the listed water year that had flow below the moving 25th percentile -days_below_recent_25th = # days below the 25th percentile flow during the period 1981 - 2010. -gradient_index = A hydrograph "spikiness" index which is calculated as the sum of all the first derivative values in a water year -days_above_recent_winter_90th = The number of days during November - March in each water year that exceeds the November - March 90th percentile flow. 90th Percentile for Nov - Mar is calculated during the years 2001 - 2010 -days_below_recent_summer_10th = The numbers of days during July - September in each water year that are below the 10th percentile flow for July - September. 10th percentile calculated over 2001 - 2010. -days_below_recent_summer_25th = The number of days during July - September in each water year that are below the 25th percentile flow for July - September. 25th percentile is calculated over 2001 - 2010. -est_vals = The number of flow measurements in each water year that are estimated (have data flag e). Estimated values USUALLY occur when there is ice on the gage. ---Normalized_stream_data_all_gages.csv - normalized stream statistics. See normalization procedure above.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In order to further study the expansion characteristics of left-turning non-motorized vehicles at intersections and the relationship between expansion characteristics and vehicle-bicycle conflicts, the trajectory point data of left-turning non-motorized vehicles are extracted using video trajectory tracking technology, and construct the cubic curve expansion envelope equation with the highest fitting degree. For the purpose of quantifying the expansion degree of non-motor vehicles after starting, two intersections in Guangxi Zhuang Autonomous Region were selected for case analysis, and the numerical range of expansion degree of the intersection with a left-turn waiting area and the intersection without a left-turn waiting area was obtained. Study the mathematical relationship between the expansion degree and its influencing factors, and establish the multivariate nonlinear regression equation between the expansion degree and the left-turn non-motorized vehicle flow, the number of parallel non-motorized vehicles, and the left-turn green light time. Analyze the vehicle-bicycle conflicts caused by the expansion of left-turning non-motorized vehicles, determine the essential factors affecting the number of non-motorized vehicles, and establish the multiple linear regression equation between the number of non-motorized vehicles and the number of left-turning non-motorized vehicles, the expansion degree, and the number of parallel non-motorized vehicles, the results show that the model has high accuracy. By analyzing the expansion characteristics of left-turning non-motorized vehicles at intersections, the relationship between different influencing factors and the expansion degree is obtained. Then the vehicle-bicycle conflicts under the influence of expansion characteristics is analyzed, providing theoretical ideas for improving traffic efficiency and optimizing traffic organization at intersections.
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Non-acoustic speech sensing system based on flexible piezoelectric
Version 1.0.0
This Read_Me.txt file briefly describes the non-acoustic speech dataset and instructions to access it.
The non-acoustic speech sensing system based on flexible piezoelectric is designed to satisfy specific needs around testing device models (in high-noise, complex environments). The system collected vibration signals from the jaws of six males and five females containing ten different control commands at 90 dB of background noise. The dataset is reliable with high intelligibility and is able to achieve 93.7% recognition accuracy by calculation. In general, this paper provides a non-acoustic speech dataset for Mandarin, including the parts collected, the number of people collected, and the environment.
The dataset is available at:
https://10.5281/zenodo.7090120
The data descriptor paper with details of data collection and cleaning process is under submission. For proper citation of the manuscript, please refer to the latest version of this dataset which includes the details.
This dataset and its descriptor paper were created by:
Shiji Yuan, Ying Sun, Dezhi Zheng, Xinlei Chen, Ying Ding,Shuai Wang, Shangchun Fan
For questions or suggestions, please e-mail Dezhi Zheng
Description:
Ten common words were chosen as the core of the vocabulary in this dataset. These ten command words can be used for commands in IoT or robotics applications: "forward", "backward", "right", "left", "stop", "up", "down", "draw", "drop", and "reset".
The recording software is Adobe Audition2022,which adopts monophonic recording, 16-bit storage format, 16 kHz sampling frequency, and the recorded voice is saved in wav format. The dataset is provided with two storage rules, which are stored by subject number and corpus number as classification. In the first rule, the speech data of 11 subjects were stored in different folders with the subject serial number as the folder name. Each folder contains subfolders categorized by corpus. In the second rule, the speech data of ten corpus are stored in different folders, and the names of the folders are the corpus contents. The subject number, corpus number and record order are given for each data entry. For example, the data obtained when subject one recorded corpus 10 for the first time was labeled as 1-10_1.
After the data collection process, a filtering algorithm for automatic detection of low non-acoustic speech data is designed to remove problematic data that are very short or very quiet. The script of the data filtering algorithm is provided in this repository.
For specific detail of the data filtering process, please refer to the script (speech data filtering algorithm in MATLAB) in this repository and the data descriptor paper.
The dataset in this repository is the processed version. The raw dataset and removed audio files are not included in this repository.
File list:
Non-acoustic Speech Dataset.zip
speech data filtering algorithm.zip
Readme.txt
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source:
Creator: Michael Redmond (redmond '@' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- culled from 1990 US Census, 1995 US FBI Uniform Crime Report, 1990 US Law Enforcement Management and Administrative Statistics Survey, available from ICPSR at U of Michigan. -- Donor: Michael Redmond (redmond '@' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- Date: July 2009
Data Set Information:
Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units.
The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA.
Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value.
The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are normalized to 0.00)).
However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community)
A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data.
Attribute Information:
'(125 predictive, 4 non-predictive, 18 potential goal) ', ' communityname: Community name - not predictive - for information only (string) ', ' state: US state (by 2 letter postal abbreviation)(nominal) ', ' countyCode: numeric code for county - not predictive, and many missing values (numeric) ', ' communityCode: numeric code for community - not predictive and many missing values (numeric) ', ' fold: fold number for non-random 10 fold cross validation, potentially useful for debugging, paired tests - not predictive (numeric - integer) ', ' population: population for community: (numeric - expected to be integer) ', ' householdsize: mean people per household (numeric - decimal) ', ' racepctblack: percentage of population that is african american (numeric - decimal) ', ' racePctWhite: percentage of population that is caucasian (numeric - decimal) ', ' racePctAsian: percentage of population that is of asian heritage (numeric - decimal) ', ' racePctHisp: percentage of population that is of hispanic heritage (numeric - decimal) ', ' agePct12t21: percentage of population that is 12-21 in age (numeric - decimal) ', ' agePct12t29: percentage of population that is 12-29 in age (numeric - decimal) ', ' agePct16t24: percentage of population that is 16-24 in age (numeric - decimal) ', ' agePct65up: percentage of population that is 65 and over in age (numeric - decimal) ', ' numbUrban: number of people living in areas classified as urban (numeric - expected to be integer) ', ' pctUrban: percentage of people living in areas classified as urban (numeric - decimal) ', ' medIncome: median household income (numeric - may be integer) ', ' pctWWage: percentage of households with wage or salary income in 1989 (numeric - decimal) ', ' pctWFarmSelf: percentage of households with farm or self employment income in 1989 (numeric - decimal) ', ' pctWInvInc: percentage of households with investment / rent income in 1989 (numeric - decimal) ', ' pctWSocSec: percentage of households with social security income in 1989 (numeric - decimal) ', ' pctWPubAsst: pe...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release Date: 2024-08-08.The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7504866, Disclosure Review Board (DRB) approval number: 2021 NES-D approval number: CBDRB-FY24-0307; 2022 ABS approval number: CBDRB-FY23-0479)...Key Table Information:.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms)...Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series)...Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2022 Annual Business Survey (ABS) collection. Data are also obtained from administrative records, the 2017 Economic Census, and other economic surveys...Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2022 ABS collection year produces statistics for the 2021 reference year. The "Year" column in the table is the reference year...Data Items and Other Identifying Records:.Data include estimates on:.Total number of employer and nonemployer firms. Total sales and receipts of employer and nonemployer firms (reported in $1,000 of dollars). Number of nonemployer firms (firms without paid employees). Sales and receipts of nonemployer firms (reported in $1,000s of dollars). Number of employer firms (firms with paid employees). Sales and receipts of employer firms (reported in $1,000s of dollars). Number of employees (during the March 12 pay period). Annual payroll of employer firms (reported in $1,000s of dollars)...These data are aggregated by the following demographic classifications of firm for:.All firms. Classifiable (firms classifiable by sex, ethnicity, race, and veteran status). . Sex. Female. Male. Equally male/female (50% / 50%). . Ethnicity. Hispanic. Equally Hispanic/non-Hispanic (50% / 50%). Non-Hispanic. . Race. White. Black or African American. American Indian and Alaska Native. Asian. Native Hawaiian and Other Pacific Islander. Minority (Firms classified as any race and ethnicity combination other than non-Hispanic and White). Equally minority/nonminority (50% / 50%). Nonminority (Firms classified as non-Hispanic and White). . Veteran Status (defined as having served in any branch of the U.S. Armed Forces). Veteran. Equally veteran/nonveteran (50% / 50%). Nonveteran. . . . Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status). ...Data Notes:.. Business ownership is defined as having 51 percent or more of the stock or equity in the business. Data are provided for firms owned equally (50% / 50%) by men and women, by Hispanics and non-Hispanics, by minorities and nonminorities, and by veterans and nonveterans. Firms not classifiable by sex, ethnicity, race, and veteran status are counted and tabulated separately.. The detail may not add to the total or subtotal because a Hispanic firm may be of any race; because a firm could be tabulated in more than one racial group; or because the number of nonemployer firm's data are rounded.. Nonemployer data do not have standard error or relative standard error columns as these data are from the universe of nonemployer firms, not from a data sample....Industry and Geography Coverage:.The data are shown for the total for all sectors (00) and 2-digit NAICS code levels for:..United States. States and the District of Columbia. Metropolitan Statistical Areas. County...Data are also shown for the 3- and 4-digit NAICS code for:..United States...Nonemployer data are excluded for the following NAICS industries:.Crop and Animal Production (NAICS 111 and 112). Rail Transportation (NAICS 482). Postal Service (NAICS 491). Monetary Authorities-Central Bank (NAICS 521). Funds, Trusts, and Other Financial Vehicles (NAICS 525). Management of Companies and Enterprises (NAICS 55). Private Households (NAICS 814). Public Administration (NAICS 92). Industries Not Classified (NAICS 99)...For more information about NAICS, see NAICS Codes & Understanding Industry Classification Systems. For information about geographies used by economic programs at the Census Bureau, see Economic Census: Economic Geographies...Employer Data Footnotes:.Footnote 660 - Agriculture, forestry...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food: Consumer Services data was reported at 244.000 Unit in 2012. This records a decrease from the previous number of 271.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food: Consumer Services data is updated yearly, averaging 271.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 278.000 Unit in 2005 and a record low of 244.000 Unit in 2012. HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food: Consumer Services data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB029: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Chandigarh: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Tobacco data was reported at 337.000 Unit in 2012. This records a decrease from the previous number of 435.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Tobacco data is updated yearly, averaging 435.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 488.000 Unit in 2005 and a record low of 337.000 Unit in 2012. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Tobacco data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB041: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Haryana: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Clothing data was reported at 716.000 Unit in 2012. This records an increase from the previous number of 669.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Clothing data is updated yearly, averaging 511.500 Unit from Jun 1994 (Median) to 2012, with 4 observations. The data reached an all-time high of 716.000 Unit in 2012 and a record low of 139.000 Unit in 1994. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Clothing data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB041: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Haryana: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Conveyance data was reported at 1,058.000 Unit in 2012. This records an increase from the previous number of 982.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Conveyance data is updated yearly, averaging 982.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 1,058.000 Unit in 2012 and a record low of 717.000 Unit in 2005. HCE: Number of Sample Households Reporting Consumption: Haryana: Urban: Non Food: Conveyance data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB041: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Haryana: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Bihar: Rural: Non Food: Medical: Non Institutional data was reported at 2,841.000 Unit in 2012. This records an increase from the previous number of 2,487.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Bihar: Rural: Non Food: Medical: Non Institutional data is updated yearly, averaging 2,731.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 2,841.000 Unit in 2012 and a record low of 2,487.000 Unit in 2010. HCE: Number of Sample Households Reporting Consumption: Bihar: Rural: Non Food: Medical: Non Institutional data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB026: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Bihar: Rural (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Lakshadweep: Urban: Non Food data was reported at 127.000 Unit in 2012. This records a decrease from the previous number of 128.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Lakshadweep: Urban: Non Food data is updated yearly, averaging 128.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 129.000 Unit in 2005 and a record low of 127.000 Unit in 2012. HCE: Number of Sample Households Reporting Consumption: Lakshadweep: Urban: Non Food data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB053: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Lakshadweep: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food data was reported at 248.000 Unit in 2012. This records a decrease from the previous number of 273.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food data is updated yearly, averaging 273.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 300.000 Unit in 2005 and a record low of 248.000 Unit in 2012. HCE: Number of Sample Households Reporting Consumption: Chandigarh: Urban: Non Food data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB029: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: Chandigarh: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HCE: Number of Sample Households Reporting Consumption: Delhi: Urban: Non Food data was reported at 887.000 Unit in 2012. This records an increase from the previous number of 842.000 Unit for 2010. HCE: Number of Sample Households Reporting Consumption: Delhi: Urban: Non Food data is updated yearly, averaging 887.000 Unit from Jun 2005 (Median) to 2012, with 3 observations. The data reached an all-time high of 1,101.000 Unit in 2005 and a record low of 842.000 Unit in 2010. HCE: Number of Sample Households Reporting Consumption: Delhi: Urban: Non Food data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under India Premium Database’s Domestic Trade and Household Survey – Table IN.HB067: HCES: Uniform Reference Period (URP): Average Monthly Per Capita Consumption Expenditure (MPCE): by Item Group: NCT of Delhi: Urban (Discontinued).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as self-evaluations of subjective well-being and self-assessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This article introduces a novel estimator, the ordered correlation forest, that can naturally handle non linearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an “honesty” condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates’ marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forest-based estimators and demonstrates its ability to construct valid confidence intervals for the covariates’ marginal effects. Comparisons using various real-world data sets further highlight the advantages of forest-based estimators over parametric models in larger samples while showing that the ordered correlation forest remains competitive in smaller samples.