37 datasets found

d
Mayor’s Office of Operations: Demographic Survey
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Mayor’s Office of Operations: Demographic Survey [Dataset]. https://catalog.data.gov/dataset/mayors-office-of-operations-demographic-survey
Explore at:
Dataset updated
Jul 19, 2025
Dataset provided by
data.cityofnewyork.us
Description
Pursuant to Local Laws 126, 127, and 128 of 2016, certain demographic data is collected voluntarily and anonymously by persons voluntarily seeking social services. This data can be used by agencies and the public to better understand the demographic makeup of client populations and to better understand and serve residents of all backgrounds and identities. The data presented here has been collected through either electronic form or paper surveys offered at the point of application for services. These surveys are anonymous. Each record represents an anonymized demographic profile of an individual applicant for social services, disaggregated by response option, agency, and program. Response options include information regarding ancestry, race, primary and secondary languages, English proficiency, gender identity, and sexual orientation. Idiosyncrasies or Limitations: Note that while the dataset contains the total number of individuals who have identified their ancestry or languages spoke, because such data is collected anonymously, there may be instances of a single individual completing multiple voluntary surveys. Additionally, the survey being both voluntary and anonymous has advantages as well as disadvantages: it increases the likelihood of full and honest answers, but since it is not connected to the individual case, it does not directly inform delivery of services to the applicant. The paper and online versions of the survey ask the same questions but free-form text is handled differently. Free-form text fields are expected to be entered in English although the form is available in several languages. Surveys are presented in 11 languages. Paper Surveys 1. Are optional 2. Survey taker is expected to specify agency that provides service 2. Survey taker can skip or elect not to answer questions 3. Invalid/unreadable data may be entered for survey date or date may be skipped 4. OCRing of free-form tet fields may fail. 5. Analytical value of free-form text answers is unclear Online Survey 1. Are optional 2. Agency is defaulted based on the URL 3. Some questions must be answered 4. Date of survey is automated
T
AmeriCorps Members Demographic
data.americorps.gov
catalog.data.gov
+1more
application/rdfxml +5
Updated Oct 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). AmeriCorps Members Demographic [Dataset]. https://data.americorps.gov/National-Service/AmeriCorps-Members-Demographic/2ca3-89j5
Explore at:
tsv, application/rssxml, csv, xml, json, application/rdfxmlAvailable download formats
Dataset updated
Oct 3, 2018
Description
The data is prepared using AmeriCorps members who began service on any day in fiscal year (FY) 2017. The members may have served 1 to 365 days during their term. Members who are in never served, disqualified, pre-service, or deferred statuses were excluded from this analysis. AmeriCorps VISTA and AmeriCorps NCCC race and ethnicity data come from the member application to serve. The code to extract the data between the two programs is the same. The ASN race and ethnicity data comes from the enrollment form. The enrollment form may exist multiple times if the member enrolled in more than one term. It is not uncommon for each enrollment form to have conflicting information about the member’s race and ethnicity. The member may have enrollment form data for terms served outside of the timeframe of the dataset. For example, if we are reporting on members who began service in FY17, then a member who also served in FY16 may have race and ethnicity information in the FY16 enrollment form and no race or ethnicity information or conflicting information in the FY17 enrollment form. In the case of conflicting information, this analysis assumes each instance of race designation is correct. If a member reports themselves as “Asian or Asian American” in one enrollment form and “White” in another enrollment form, then the analysis categorizes this person as someone who identifies with multiple race selections vs. one or the other. In the case of ethnicity, if a member indicates that they are not Hispanic or Latino/a in one form, but that they are in another, this analysis assumes the affirmative—and they will be categorized as Hispanic or Latino/a. Lastly, the totals include the total results from the query plus the difference between the query and the raw count of members who started service in that fiscal year. The members who did not have a record in the invite table and enrollment table were added to the non-response category. Senior Corps Figures come from the Annual Progress Report Supplement as of April 11, 2018. Percentages are calculated from totals of the subcategories, excluding the non-response categories.
u
Population by Tracts 2018
gstore.unm.edu
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Population by Tracts 2018 [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/adecfea6-fcd7-4c41-8165-165c4490a9da/metadata/ISO-19115:2003.html
Explore at:
Dataset updated
Mar 6, 2020
Time period covered
2018
Area covered
West Bound -109.050173 East Bound -103.001964 North Bound 37.000293 South Bound 31.332172
Description
A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico Census tracts). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. While the ACS contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by Census tract boundaries in New Mexico. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
opendatalab.com
+6more
application/rdfxml +5
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
f
Demographic Performa
figshare.com
docx
Updated Apr 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radhika Pai (2022). Demographic Performa [Dataset]. http://doi.org/10.6084/m9.figshare.19499072.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19499072.v1
Dataset updated
Apr 7, 2022
Dataset provided by
figshare
Authors
Radhika Pai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
this form described the socio demographic characteristics of the participants
e
ZUMA Standard Demography (Time Series) - Dataset - B2FIND
b2find.eudat.eu
Updated May 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ZUMA Standard Demography (Time Series) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/83e4a93a-5bb3-57c6-9445-113a9160b988
Explore at:
Dataset updated
May 3, 2023
Description
Integrated data set of the ZUMA standard demography from 9 representative surveys. Based on the standard demography the demographic data of the respondents is determined in detailed form. Topics: age; sex; education level and employment; current or last occupational position; company size; place of work; primary source of income; marital status; type of employment of spouse; last occupational position and education level of spouse; occupational position and education level of father; religious denomination; frequence of church attendance; political interest; participation in the Federal Parliament election and party preference; memberships; residential status; local residency; type of city; city size; residential status; self-assessment of social class. Interviewer rating: presence of other persons and their degree of relationship to respondent; intervention of others in the interview; reliability and willingness of respondent to cooperate; length of interview. Integrierter Datensatz der ZUMA-Standarddemographie aus 10 repräsentativen Befragungen. Anhand der Standarddemographie werden in detaillierter Form die demographischen Daten der Befragten ermittelt. Themen: Alter; Geschlecht; Ausbildungsniveau und Berufstätigkeit; derzeitige bzw. letzte berufliche Stellung; Betriebsgröße; Arbeitsort; Haupteinkommensquelle; Familienstand; Art der Erwerbstätigkeit des Ehepartners; letzte berufliche Stellung und Ausbildungsniveau des Ehepartners; berufliche Stellung und Ausbildungsniveau des Vaters; Konfession; Kirchgangshäufigkeit; politisches Interesse; Wahlbeteiligung bei der Bundestagswahl und Parteipräferenz; Mitgliedschaften; Wohnstatus; Ortsansässigkeit; Ortstyp; Ortsgröße; Wohnstatus; Selbsteinschätzung der Schichtzugehörigkeit. Interviewerrating: Anwesenheit anderer Personen und deren Verwandtschaftsgrad zum Befragten; Eingriffe Dritter in das Interview; Kooperationsbereitschaft und Zuverlässigkeit des Befragten; Interviewdauer. Multi-stage stratified random samples in 10 different investigations. Mehrstufig geschichtete Zufallsauswahlen in 10 verschiedenen Untersuchungen.
Replication Package for ML-EUP Conversational Agent Study
zenodo.org
pdf
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). Replication Package for ML-EUP Conversational Agent Study [Dataset]. http://doi.org/10.5281/zenodo.7780223
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7780223
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package Files

1. Forms.zip: contains the forms used to collect data for the experiment

2. Experiments.zip: contains the participants’ and sandboxers’ experimental task workflow with Newton.

3. Responses.zip: contains the responses collected from participants during the experiments.

4. Analysis.zip: contains the data analysis scripts and results of the experiments.

5. newton.zip: contains the tool we used for the WoZ experiment.

TutorialStudy.pdf: script used in the experiment with and without Newton to be consistent with all participants.

Woz_Script.pdf: script wizard used to maintain consistent Newton responses among the participants.

1. Forms.zip

The forms zip contains the following files:

Demographics.pdf: a PDF form used to collect demographic information from participants before the experiments

Post-Task Control (without the tool).pdf: a PDF form used to collect data from participants about challenges and interactions when performing the task without Newton

Post-Task Newton (with the tool).pdf: a PDF form used to collect data from participants after the task with Newton.

Post-Study Questionnaire.pdf: a PDF form used to collect data from the participant after the experiment.

2. Experiments.zip

The experiments zip contains two types of folders:

exp[participant’s number]-c[number of dataset used for control task]e[number of dataset used for experimental task]. Example: exp1-c2e1 (experiment participant 1 - control used dataset 2, experimental used dataset 1)

sandboxing[sandboxer’s number]. Example: sandboxing1 (experiment with sandboxer 1)

Every experiment subfolder contains:

warmup.json: a JSON file with the results of Newton-Participant interactions in the chat for the warmup task.

warmup.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the warmup task.

sample1.csv: Death Event dataset.

sample2.csv: Heart Disease dataset.

tool.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the experimental task.

python.ipynb: a Jupyter notebook file with the participant’s results from the code they tried during the control task.

results.json: a JSON file with the results of Newton-Participant interactions in the chat for the task with Newton.

To load an experiment chat log into Newton, add the following code to the notebook:

import anachat import json with open("result.json", "r") as f: anachat.comm.COMM.history = json.load(f)

Then, click on the notebook name inside Newton chat

Note 1: the subfolder for P6 is exp6-e2c1-serverdied because the experiment server died before we were able to save the logs. We reconstructed them using the notebook newton_remake.ipynb based on the video recording.

Note 2: The sandboxing occurred during the development of Newton. We did not collect all the files, and the format of JSON files is different than the one supported by the attached version of Newton.

3. Responses.zip

The responses zip contains the following files:

demographics.csv: a CSV file containing the responses collected from participants using the demographics form

task_newton.csv: a CSV file containing the responses collected from participants using the post-task newton form.

task_control.csv: a CSV file containing the responses collected from participants using the post-task control form.

post_study.csv: a CSV file containing the responses collected from participants using the post-study control form.

4. Analysis.zip

The analysis zip contains the following files:

1.Challenge.ipynb: a Jupyter notebook file where the perceptions of challenges figure was created.

2.Interactions.py: a Python file where the participants’ JSON files were created.

3.Interactions.Graph.ipynb: a Jupyter notebook file where the participant’s interaction figure was created.

4.Interactions.Count.ipynb: a Jupyter notebook file that counts participants’ interaction with each figure.

config_interactions.py: this file contains the definitions of interaction colors and grouping

interactions.json: a JSON file with the interactions during the Newton task of each participant based on the categorization.

requirements.txt: dependencies required to run the code to generate the graphs and json analysis.

To run the analyses, install the dependencies on python 3.10 with the following command and execute the scripts and notebooks in order.:

pip install -r requirements.txt

5. newton.zip

The newton zip contains the source code of the Jupyter Lab extension we used in the experiments. Read the README.md file inside it for instructions on how to install and run it.
undefined undefined: undefined | undefined (undefined)
data.census.gov
Updated Dec 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau (2021). undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ABSNESD2018.AB00MYNESD01C?q=Santa+Cruz+County,+California+Business+and+Economy&g=050XX00US06065,06087_040XX00US06&y=2018
Explore at:
Dataset updated
Dec 16, 2021
Dataset provided by
United States Census Bureauhttp://census.gov/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Race for the U.S., States, and Metro Areas: 2018.Table ID.ABSNESD2018.AB00MYNESD01C.Survey/Program.Economic Surveys.Year.2018.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2018 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2021-12-16.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2019 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2017 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2019 ABS collection year produces statistics for the 2018 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Race White Black or African American American Indian and Alaska Native Asian Native Hawaiian and Other Pacific Islander Minority (Firms classified as any race and ethnicity combination other than non-Hispanic and White) Equally minority/nonminority Nonminority (Firms classified as non-Hispanic and White) Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.Data are shown for the total for all sectors (00) and the 2-digit NAICS levels for the U.S., states and District of Columbia, and metro areas.For information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Management of Companies and Enterprises (NAICS 55)Private Households (NAICS 814)Public Administration (NAICS 92)Industries Not Classified (NAICS 99)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the Census Bureau. The NES-D adds demographic characteristics to the NES da...
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/92f102fa-5d6c-41b6-8cf9-132f78a30e02/metadata/FGDC-STD-001-1998.html
Explore at:
csv(5), zip(5), json(5), gml(5), geojson(5), xls(5), shp(5), kml(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2017
Area covered
West Bounding Coordinate -109.050173 East Bounding Coordinate -103.001964 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.332172, New Mexico
Description
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico Census tracts). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. While the ACS contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by Census tract boundaries in New Mexico. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
S
Dataset of depression and anxiety among the elderly derived from The...
scidb.cn
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Yan; Qi Huang (2022). Dataset of depression and anxiety among the elderly derived from The Nottingham Longitudinal Study of Activity and Ageing (NLSAA) project [Dataset]. http://doi.org/10.57760/sciencedb.06263
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.06263
Dataset updated
Nov 9, 2022
Dataset provided by
Science Data Bank
Authors
Ying Yan; Qi Huang
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset is derived from The Nottingham Longitudinal Study on Activity and Ageing (NLSAA). NLSAA is an 8-year survey for people aged 65 and above, which collects demographic information and a large amount of life data of this population. The baseline survey (T1) was conducted in the summer of 1985. During this period, the data collection team randomly sampled 1,299 people aged 65 and above according to the list provided by general practitioners in Nottinghamshire, and interviewed them. After that, every four years, the population was followed up at T2 (in the summer of 1989) and T3 (in the summer of 1993). The NLSAA data finally contains 1263 variables and 1042 observations. The data describes the prevalence of depression and anxiety among the elderly in NLSAA is extracted and used to form this dataset.In NLSAA, we take the sample with depression and anxiety (psych_=1) as positive, and the sample without depression and anxiety (psych_=0) as negative. In order to balance the categories of sample in the dataset, we extract the positive samples and the negative samples from the T1 survey and only positive samples from the T2 and T3 surveys as the observations of the dataset. Then, according to the relevant literature, we extract the risk variables of depression and anxiety in the elderly from NLSAA as the variables of the dataset. As a result, there are 1152 valid observations and 54 risk variables of depression and anxiety in the elderly in this dataset.Note: To access the original NLSAA dataset, please contact Professor Kevin Morgan (https://www.lboro.ac.uk/departments/ssehs/staff/kevin-morgan/, E-mail Address: K.Morgan@lboro.ac.uk) to get permission for accessing and the copy of the dataset.
d
Expert opinions of demographic rates of Argentine black and white tegus in...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Expert opinions of demographic rates of Argentine black and white tegus in South Florida [Dataset]. https://catalog.data.gov/dataset/expert-opinions-of-demographic-rates-of-argentine-black-and-white-tegus-in-south-florida
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Florida, South Florida
Description
We illustrate the utility of expert elicitation, explicit recognition of uncertainty, and the value of information for directing management and research efforts for invasive species, using tegu lizards (Salvator merianae) in southern Florida as a case study. We posited a post-birth pulse, matrix model, which was parameterized using a 3-point process to elicit estimates of tegu demographic rates from herpetology experts. We fit statistical distributions for each parameter and for each expert, then drew and pooled a large number of replicate samples from these to form a distribution for each demographic parameter. Using these distributions, we generated a large sample of matrix models to infer how the tegu population might respond to control efforts. We used the concepts of Pareto efficiency and stochastic dominance to conclude that targeting older age classes at relatively high rates appears to have the best chance of minimizing tegu abundance and control costs. Expert opinion combined with an explicit consideration of uncertainty can be valuable for conducting an initial assessment of the effort needed to control the invader. The value of information can be used to focus research in a way that not only helps increases the efficacy of control, but minimizes costs as well.
g
Census of Population and Housing, 1980 [United States]: Public Use Microdata...
search.gesis.org
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Commerce. Bureau of the Census (2001). Census of Population and Housing, 1980 [United States]: Public Use Microdata Sample (A Sample): 5-Percent Sample - Version 2 [Dataset]. http://doi.org/10.3886/ICPSR08101.v2
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR08101.v2
Dataset updated
Feb 1, 2001
Dataset provided by
GESIS search
ICPSR - Interuniversity Consortium for Political and Social Research
Authors
United States Department of Commerce. Bureau of the Census
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442616https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442616
Area covered
United States
Description
Abstract (en): The Public Use Microdata Samples (PUMS) contain person- and household-level information from the "long-form" questionnaires distributed to a sample of the population enumerated in the 1980 Census. This data collection, containing 5-percent data, identifies every state, county groups, and most individual counties with 100,000 or more inhabitants (350 in all). In many cases, individual cities or groups of places with 100,000 or more inhabitants are also identified. Household-level variables include housing tenure, year structure was built, number and types of rooms in dwelling, plumbing facilities, heating equipment, taxes and mortgage costs, number of children, and household and family income. The person record contains demographic items such as sex, age, marital status, race, Spanish origin, income, occupation, transportation to work, and education. All persons and housing units in the United States and Puerto Rico. For this data collection, the full 1980 Census sample that received the "long-form" questionnaire (19.4 percent of all households) was sampled again through a stratified systematic selection procedure with probability proportional to a measure of size. This 5-percent sample, i.e., 5 households for every 100 households in the nation, includes over one-fourth of the households that received the long-form questionnaire. 2006-01-12 All files were removed from dataset 81 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 80 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 81 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 80 and flagged as study-level files, so that they will accompany all downloads.1997-08-25 Part 72, Puerto Rico data, has been added to the collection, as well as supplemental documentation for Puerto Rico in the form of a separate PDF file. The household and person records in each hierarchical data file have logical record lengths of 193 characters, but the number of records varies with each file.The record layout for Part 72, Puerto Rico, is different from the state datasets. Refer to the supplemental documentation for this part.The codebook is available in hardcopy form only, while the Puerto Rico supplemental documentation is provided as a Portable Document Format (PDF) file.
o
2021 Long Form Census - Ward Data
open.ottawa.ca
hub.arcgis.com
+3more
Updated Nov 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Ottawa (2023). 2021 Long Form Census - Ward Data [Dataset]. https://open.ottawa.ca/datasets/59b3ea74ecea4a8a9d7fb797190bbdb8
Explore at:
Dataset updated
Nov 28, 2023
Dataset authored and provided by
City of Ottawa
License
https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0
Description
The 2021 long form Census questionnaire was sent out to 25% of all households. The 2021 short form Census questionnaire was sent out to 100% of all households. Because one is a census and one is a sample survey, variables that are available in both the 100% data and 25% sample may have different values. For example, the total population of the city taken from the 25% sample could differ from that taken from the 100% data.Source: Statistics Canada, 2021 Census, Custom Tabulation, census profile data for user-specified ward areas. Data received November 2023.Date Created: November 22 2023Update Frequency: Updated with each five-year national Census (next census undertaken in 2026; updated ward data are expected in 2028)Data Steward: Eva WalrondData Steward Email: Eva.walrond@ottawa.caDepartment or Agency: Planning, Real Estate and Economic DevelopmentBranch/Unit: Research & Forecasting
Replication Package for ML-EUP Conversational Agent Study
zenodo.org
pdf
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Arteaga Garcia; Emily Arteaga Garcia; João Felipe Pimentel; João Felipe Pimentel; Zixuan Feng; Zixuan Feng; Marco Gerosa; Marco Gerosa; Igor Steinmacher; Igor Steinmacher; Anita Sarma; Anita Sarma (2024). Replication Package for ML-EUP Conversational Agent Study [Dataset]. http://doi.org/10.5281/zenodo.8327190
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8327190
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emily Arteaga Garcia; Emily Arteaga Garcia; João Felipe Pimentel; João Felipe Pimentel; Zixuan Feng; Zixuan Feng; Marco Gerosa; Marco Gerosa; Igor Steinmacher; Igor Steinmacher; Anita Sarma; Anita Sarma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package of the paper How to Support ML End-User Programmers through a Conversational Agent, published at ICSE 2024.

Replication Package Files

1. Forms.zip: contains the forms used to collect data for the experiment

2. Experiments.zip: contains the participants’ and sandboxers’ experimental task workflow with Newton.

3. Responses.zip: contains the responses collected from participants during the experiments.

4. Analysis.zip: contains the data analysis scripts and results of the experiments.

5. newton.zip: contains the tool we used for the WoZ experiment.

Interactions.pdf: explains Figure 4 of the paper in detail by depicting the interactions of P4.

TutorialStudy.pdf: script used in the experiment with and without Newton to be consistent with all participants.

Woz_Script.pdf: script wizard used to maintain consistent Newton responses among the participants.

1. Forms.zip

The forms zip contains the following files:

Demographics.pdf: a PDF form used to collect demographic information from participants before the experiments

Post-Task Control (without the tool).pdf: a PDF form used to collect data from participants about challenges and interactions when performing the task without Newton

Post-Task Newton (with the tool).pdf: a PDF form used to collect data from participants after the task with Newton.

Post-Study Questionnaire.pdf: a PDF form used to collect data from the participant after the experiment.

2. Experiments.zip

The experiments zip contains two types of folders:

exp[participant’s number]-c[number of dataset used for control task]e[number of dataset used for experimental task]. Example: exp1-c2e1 (experiment participant 1 - control used dataset 2, experimental used dataset 1)

sandboxing[sandboxer’s number]. Example: sandboxing1 (experiment with sandboxer 1)

Every experiment subfolder contains:

warmup.json: a JSON file with the results of Newton-Participant interactions in the chat for the warmup task.

warmup.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the warmup task.

sample1.csv: Death Event dataset.

sample2.csv: Heart Disease dataset.

tool.ipynb: a Jupyter notebook file with the participant’s results from the code provided by Newton in the experimental task.

python.ipynb: a Jupyter notebook file with the participant’s results from the code they tried during the control task.

results.json: a JSON file with the results of Newton-Participant interactions in the chat for the task with Newton.

To load an experiment chat log into Newton, add the following code to the notebook:

import anachat import json with open("result.json", "r") as f: anachat.comm.COMM.history = json.load(f)

Then, click on the notebook name inside Newton chat

Note 1: the subfolder for P6 is exp6-e2c1-serverdied because the experiment server died before we were able to save the logs. We reconstructed them using the notebook newton_remake.ipynb based on the video recording.

Note 2: The sandboxing occurred during the development of Newton. We did not collect all the files, and the format of JSON files is different than the one supported by the attached version of Newton.

3. Responses.zip

The responses zip contains the following files:

demographics.csv: a CSV file containing the responses collected from participants using the demographics form

task_newton.csv: a CSV file containing the responses collected from participants using the post-task newton form.

task_control.csv: a CSV file containing the responses collected from participants using the post-task control form.

post_study.csv: a CSV file containing the responses collected from participants using the post-study control form.

4. Analysis.zip

The analysis zip contains the following files:

1.Challenge.ipynb: a Jupyter notebook file that performs the statistical tests and creates the perceptions of challenges figure.

2.Interactions.py: a Python file that creates the participants’ JSON files.

3.Interactions.Graph.ipynb: a Jupyter notebook file that creates the participant’s interaction figure.

4.Interactions.Count.ipynb: a Jupyter notebook file that counts participants’ interaction with each figure.

config_interactions.py: this file contains the definitions of interaction colors and grouping

interactions.json: a JSON file with the interactions during the Newton task of each participant based on the categorization.

requirements.txt: dependencies required to run the code to generate the graphs and json analysis.

To run the analyses, please follow the steps:

1- Extract Analysis.zip and cd into the directory

2- Install Python 3.10, and then the analysis dependencies with the following command:

pip install -r requirements.txt

3- Run Jupyter Notebook/Lab and execute all cells of 1.Challenge.ipynb. It will generate the challenges figure.

4- Run 2.Interactions.py using the following command:

python 2.Interactions.py

This file was created manually by individually categorizing each interaction of the participants. The execution will generate the file interactions.json with the graph definitions of the interactions.

5- Run Jupyter Notebook/Lab and execute all cells of 3.Interactions.Graph.ipynb. It will create the interactions graph visualization.

6- Run Jupyter Notebook/Lab and execute all cells of 4.Interactions.Count.ipynb. It will create the interactions table.

5. newton.zip

The newton zip contains the source code of the Jupyter Lab extension we used in the experiments. Read the README.md file inside it for instructions on how to install and run it.
2020 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by...
data.census.gov
Updated Feb 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECN (2024). 2020 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Sex for the U.S., States, and Metro Areas: 2020 (ECNSVY Nonemployer Statistics by Demographics Company Summary) [Dataset]. https://data.census.gov/table/ABSNESD2020.AB00MYNESD01A?q=grocery+stores+in+georgia+in+2020
Explore at:
Dataset updated
Feb 8, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ECN
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2020
Area covered
United States
Description
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Sex for the U.S., States, and Metro Areas: 2020.Table ID.ABSNESD2020.AB00MYNESD01A.Survey/Program.Economic Surveys.Year.2020.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2020 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2024-02-08.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2021 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2017 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2021 ABS collection year produces statistics for the 2020 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Sex Female Male Equally male-owned and female-owned Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The data are shown for the total of all sectors (00) and the 2-digit NAICS code levels for:United StatesStates and the District of ColumbiaMetropolitan Statistical AreasData are also shown for the 3- and 4-digit NAICS code for:United StatesStates and the District of ColumbiaFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 4-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the Census Bureau. The NES-D adds demographic characteristics to the NES data and produces the total firm counts and the total receipts by those demographic characteristics. The NES-D utilizes various administrative records (AR) and the Census Bureau data sources that inc...
Synthetic genomic data
kaggle.com
Updated Apr 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleg Bushuev (2023). Synthetic genomic data [Dataset]. https://www.kaggle.com/datasets/oubush/synthetic-genomic-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Oleg Bushuev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises images representing animal genotypes and offers a unique opportunity to delve into the realm of image processing techniques applied to genomic analysis. The original genomic data were sourced from Daniela Lourenco's GitHub repository https://github.com/danielall/Data_ssGBLUP, which contains data used as examples in the paper entitled "Single-step genomic evaluations from theory to practice: using SNP chips and sequence data in blupf90" by Lourenco et al. (2020). According to the data description, these data were simulated using QMSim (Sargolzaei & Schenkel, 2009). All the genetic variance was explained by 500 QTL. Animals were genotyped for 45,000 SNP and the average LD was 0.18. 2024 animals have genotypes and phenotypes. SNP genotype is coded based on the number of copies of the alternative allele (0, 1, 2).

Simulation details
Data were simulated using the software QMsim (Sargolzaei and Schenkel, 2009). In the first simulation step, 200 generations of the historical population were simulated to create mutation and drift equilibrium and linkage disequilibrium (LD). This historical population started from 50,000 individuals and decreased to 2,100 in the last generation, with an equal proportion of males and females. The second step generated an expanded population, which started with 10 males and 2000 females from the last historical generation. Each one of the 2000 females was randomly mated and produced 1 offspring per generation. Sire and dam were randomly replaced over 20 generations, and the replacement was 50% and 20%, respectively. The third step was used to generate the recent population that had the same parameters as the expansion population. Five generations were simulated, and all animals were genotyped. Only data from the recent population were used, which comprised pedigree information and phenotypes for 10,000 animals, and genotypes for 1020 parents from generations 1-4 and 1004 individuals in generation 5. For the genome, 29 chromosomes with a total of 2319 cM were simulated. Each chromosome had a similar number of SNP as the BovineSNP50k BeadChip (Illumina Inc., San Diego, CA). Although the number of simulated SNP was 54,000, nearly 45,000 passed the quality control and remained in the analyses. Along with SNP, 500 biallelic QTL were randomly placed on chromosomes. The QTL effects were sampled from a gamma distribution. The QTL and SNP had recurrent mutations with a probability of 2.5 × 10-5.
Namibia Population and Housing Census 2011 - Namibia
microdata.nsanamibia.com
Updated Sep 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Namibia Statistics Agency (2024). Namibia Population and Housing Census 2011 - Namibia [Dataset]. https://microdata.nsanamibia.com/index.php/catalog/9
Explore at:
Dataset updated
Sep 30, 2024
Dataset authored and provided by
Namibia Statistics Agencyhttps://nsa.org.na/
Time period covered
2011
Area covered
Namibia
Description
Abstract

The 2011 Population and Housing Census is the third national Census to be conducted in Namibia after independence. The first was conducted 1991 followed by the 2001 Census. Namibia is therefore one of the countries in sub-Saharan Africa that has participated in the 2010 Round of Censuses and followed the international best practice of conducting decennial Censuses, each of which attempts to count and enumerate every person and household in a country every ten years. Surveys, by contrast, collect data from samples of people and/or households.

Censuses provide reliable and critical data on the socio-economic and demographic status of any country. In Namibia, Census data has provided crucial information for development planning and programme implementation. Specifically, the information has assisted in setting benchmarks, formulating policy and the evaluation and monitoring of national development programmes including NDP4, Vision 2030 and several sector programmes. The information has also been used to update the national sampling frame which is used to select samples for household-based surveys, including labour force surveys, demographic and health surveys, household income and expenditure surveys. In addition, Census information will be used to guide the demarcation of Namibia's administrative boundaries where necessary.

At the international level, Census information has been used extensively in monitoring progress towards Namibia's achievement of international targets, particularly the Millennium Development Goals (MDGs).

The latest and most comprehensive Census was conducted in August 2011. Preparations for the Census started in the 2007/2008 financial year under the auspices of the then Central Bureau of Statistics (CBS) which was later transformed into the Namibia Statistics Agency (NSA). The NSA was established under the Statistics Act No. 9 of 2011, with the legal mandate and authority to conduct population Censuses every 10 years. The Census was implemented in three broad phases; pre-enumeration, enumeration and post enumeration.

During the first pre-enumeration phase, activities accomplished including the preparation of a project document, establishing Census management and technical committees, and establishing the Census cartography unit which demarcated the Enumeration Areas (EAs). Other activities included the development of Census instruments and tools, such as the questionnaires, manuals and field control forms.

Field staff were recruited, trained and deployed during the initial stages of the enumeration phase. The actual enumeration exercise was undertaken over a period of about three weeks from 28 August to 15 September 2011, while 28 August 2011 was marked as the reference period or 'Census Day'.

Great efforts were made to check and ensure that the Census data was of high quality to enhance its credibility and increase its usage. Various quality controls were implemented to ensure relevance, timeliness, accuracy, coherence and proper data interpretation. Other activities undertaken to enhance quality included the demarcation of the country into small enumeration areas to ensure comprehensive coverage; the development of structured Census questionnaires after consultat.The post-enumeration phase started with the sending of completed questionnaires to Head Office and the preparation of summaries for the preliminary report, which was published in April 2012. Processing of the Census data began with manual editing and coding, which focused on the household identification section and un-coded parts of the questionnaire. This was followed by the capturing of data through scanning. Finally, the data were verified and errors corrected where necessary. This took longer than planned due to inadequate technical skills.

Geographic coverage

National coverage

Analysis unit

Households and persons

Universe

The sampling universe is defined as all households (private and institutions) from 2011 Census dataset.

Kind of data

Census/enumeration data [cen]

Sampling procedure

Sample Design

The stratified random sample was applied on the constituency and urban/rural variables of households list from Namibia 2011 Population and Housing Census for the Public Use Microdata Sample (PUMS) file. The sampling universe is defined as all households (private and institutions) from 2011 Census dataset. Since urban and rural are very important factor in the Namibia situation, it was then decided to take the stratum at the constituency and urban/rural levels. Some constituencies have very lower households in the urban or rural, the office therefore decided for a threshold (low boundary) for sampling within stratum. Based on data analysis, the threshold for stratum of PUMS file is 250 households. Thus, constituency and urban/rural areas with less than 250 households in total were included in the PUMS file. Otherwise, a simple random sampling (SRS) at a 20% sample rate was applied for each stratum. The sampled households include 93,674 housing units and 418,362 people.

Sample Selection

The PUMS sample is selected from households. The PUMS sample of persons in households is selected by keeping all persons in PUMS households. Sample selection process is performed using Census and Survey Processing System (CSPro).

The sample selection program first identifies the 7 census strata with less than 250 households and the households (private and institutions) with more than 50 people. The households in these areas and with this large size are all included in the sample. For the other households, the program randomly generates a number n from 0 to 4. Out of every 5 households, the program selects the nth household to export to the PUMS data file, creating a 20 percent sample of households. Private households and institutions are equally sampled in the PUMS data file.

Note: The 7 census strata with less than 250 households are: Arandis Constituency Rural, Rehoboth East Urban Constituency Rural, Walvis Bay Rural Constituency Rural, Mpungu Constituency Urban, Etayi Constituency Urban, Kalahari Constituency Urban, and Ondobe Constituency Urban.

Mode of data collection

Face-to-face [f2f]

Research instrument

The following questionnaire instruments were used for the Namibia 2011 Population and and Housing Census:

Form A (Long Form): For conventional households and residential institutions

Form B1 (Short Form): For special population groups such as persons in transit (travellers), police cells, homeless and off-shore populations

Form B2 (Short Form): For hotels/guesthouses

Form B3 (Short Form): For foreign missions/diplomatic corps

Cleaning operations

Data editing took place at a number of stages throughout the processing, including: a) During data collection in the field b) Manual editing and coding in the office c) During data entry (Primary validation/editing) Structure checking and completeness using Structured Query Language (SQL) program d) Secondary editing: i. Imputations of variables ii. Structural checking in Census and Survey Processing System (CSPro) program

Sampling error estimates

Sampling Error The standard errors of survey estimates are needed to evaluate the precision of the survey estimation. The statistical software package such as SPSS or SAS can accurately estimate the mean and variance of estimates from the survey. SPSS or SAS software package makes use of the Taylor series approach in computing the variance.

Data appraisal

Data quality Great efforts were made to check and ensure that the Census data was of high quality to enhance its credibility and increase its usage. Various quality controls were implemented to ensure relevance, timeliness, accuracy, coherence and proper data interpretation. Other activities undertaken to enhance quality included the demarcation of the country into small enumeration areas to ensure comprehensive coverage; the development of structured Census questionnaires after consultation with government ministries, university expertise and international partners; the preparation of detailed supervisors' and enumerators' instruction manuals to guide field staff during enumeration; the undertaking of comprehensive publicity and advocacy programmes to ensure full Government support and cooperation from the general public; the testing of questionnaires and other procedures; the provision of adequate training and undertaking of intensive supervision using four supervisory layers; the editing of questionnaires at field level; establishing proper mechanisms which ensured that all completed questionnaires were properly accounted for; ensuring intensive verification, validating all information and error corrections; and developing capacity in data processing with support from the international community.
f
Random Forest (RF), Gradient Boosting (GB), and AdaBoost (AB) methods:...
plos.figshare.com
xls
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soroosh Shalileh; Dmitry Ignatov; Anastasiya Lopukhina; Olga Dragoy (2023). Random Forest (RF), Gradient Boosting (GB), and AdaBoost (AB) methods: Hyperparameters’ domain and the corresponding tuned values at the data sets under consideration. [Dataset]. http://doi.org/10.1371/journal.pone.0292047.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292047.t004
Dataset updated
Nov 24, 2023
Dataset provided by
PLOS ONE
Authors
Soroosh Shalileh; Dmitry Ignatov; Anastasiya Lopukhina; Olga Dragoy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Ne, Mss, Msl, lr, in respect, represents the number of estimators, minimum number of samples per split, minimum number of samples per leaf, and learning rate.
2022 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by...
data.census.gov
test.data.census.gov
Updated May 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECN (2025). 2022 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Sex for the U.S., States, Metro Areas, Counties, and Places: 2022 (ECNSVY Nonemployer Statistics by Demographics Company Summary) [Dataset]. https://data.census.gov/all/tables?q=D'Zar%20Motors
Explore at:
Dataset updated
May 13, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ECN
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2022
Area covered
United States
Description
Key Table Information.Table Title.Nonemployer Statistics by Demographics series (NES-D): Statistics for Employer and Nonemployer Firms by Industry and Sex for the U.S., States, Metro Areas, Counties, and Places: 2022.Table ID.ABSNESD2022.AB00MYNESD01A.Survey/Program.Economic Surveys.Year.2022.Dataset.ECNSVY Nonemployer Statistics by Demographics Company Summary.Source.U.S. Census Bureau, 2022 Economic Surveys, Nonemployer Statistics by Demographics.Release Date.2025-05-08.Release Schedule.The Nonemployer Statistics by Demographics (NES-D) is released yearly, beginning in 2017..Sponsor.National Center for Science and Engineering Statistics, U.S. National Science Foundation.Table Universe.Data in this table combines estimates from the Annual Business Survey (employer firms) and the Nonemployer Statistics by Demographics (nonemployer firms).Includes U.S. firms with no paid employment or payroll, annual receipts of $1,000 or more ($1 or more in the construction industries) and filing Internal Revenue Service (IRS) tax forms for sole proprietorships (Form 1040, Schedule C), partnerships (Form 1065), or corporations (the Form 1120 series).Includes U.S. employer firms estimates of business ownership by sex, ethnicity, race, and veteran status from the 2023 Annual Business Survey (ABS) collection. The employer business dataset universe consists of employer firms that are in operation for at least some part of the reference year, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees and annual receipts of $1,000 or more, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS), except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered.Data are also obtained from administrative records, the 2022 Economic Census, and other economic surveys. Note: For employer data only, the collection year is the year in which the data are collected. A reference year is the year that is referenced in the questions on the survey and in which the statistics are tabulated. For example, the 2023 ABS collection year produces statistics for the 2022 reference year. The "Year" column in the table is the reference year..Methodology.Data Items and Other Identifying Records.Total number of employer and nonemployer firmsTotal sales, value of shipments, or revenue of employer and nonemployer firms ($1,000)Number of nonemployer firmsSales, value of shipments, or revenue of nonemployer firms ($1,000)Number of employer firmsSales, value of shipments, or revenue of employer firms ($1,000)Number of employeesAnnual payroll ($1,000)These data are aggregated by the following demographic classifications of firm for:All firms Classifiable (firms classifiable by sex, ethnicity, race, and veteran status) Sex Female Male Equally male-owned and female-owned Unclassifiable (firms not classifiable by sex, ethnicity, race, and veteran status) Definitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the NES-D and the ABS are companies or firms rather than establishments. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization..Geography Coverage.The 2022 data are shown for the total of all sectors (00) and the 2- to 6-digit NAICS code levels for:United StatesStates and the District of ColumbiaIn addition, the total of all sectors (00) NAICS and the 2-digit NAICS code levels for:Metropolitan Statistical AreasMicropolitan Statistical AreasMetropolitan DivisionsCombined Statistical AreasCountiesEconomic PlacesFor information about geographies, see Geographies..Industry Coverage.The data are shown for the total of all sectors ("00"), and at the 2- through 6-digit NAICS code levels depending on geography. Sector "00" is not an official NAICS sector but is rather a way to indicate a total for multiple sectors. Note: Other programs outside of ABS may use sector 00 to indicate when multiple NAICS sectors are being displayed within the same table and/or dataset.The following are excluded from the total of all sectors:Crop and Animal Production (NAICS 111 and 112)Rail Transportation (NAICS 482)Postal Service (NAICS 491)Monetary Authorities-Central Bank (NAICS 521)Funds, Trusts, and Other Financial Vehicles (NAICS 525)Office of Notaries (NAICS 541120)Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)Private Households (NAICS 814)Public Administration (NAICS 92)For information about NAICS, see North American Industry Classification System..Sampling.NES-D nonemployer data are not conducted through sampling. Nonemployer Statistics (NES) data originate from statistical information obtained through business income tax records that the Internal Revenue Service (IRS) provides to the Census Bureau. ...
d
National Survey of the Japanese Elderly
dknet.org
neuinfo.org
+2more
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). National Survey of the Japanese Elderly [Dataset]. http://identifiers.org/RRID:SCR_008971
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008971
Dataset updated
Sep 22, 2023
Description
A panel data set for use in cross-cultural analyses of aging, health, and well-being between the U.S. and Japan. The questionnaires were designed to be partially comparable to many surveys of the aged, including Americans'' Changing Lives; 1984 National Health Interview Survey Supplement on Aging; Health and Retirement Study (HRS), and Well-Being Among the Aged: Personal Control and Self-Esteem (WBA). NSJE questionnaire topics include: * Demographics (age, sex, marital status, education, employment) * Social Integration (interpersonal contacts, social supports) * Health Limitations on daily life and activities * Health Conditions * Health Status (ratings of present health) * Level of physical activity * Subjective Well-Being and Mental Health Status (life satisfaction, morale), * Psychological Indicators (life events, locus of control, self-esteem) * Financial situation (financial status) * Memory (measures of cognitive functioning) * Interviewer observations (assessments of respondents) The NSJE was based on a national sample of 2,200 noninstitutionalized elderly aged 60+ in Japan. This cohort has been interviewed once every 3 years since 1987. To ensure that the data are representative of the 60+ population, the samples in 1990 and 1996 were refreshed to add individuals aged 60-62. In 1999, a new cohort of Japanese adults aged 70+ was added to the surviving members of previous cohorts to form a database of 3,990 respondents 63+, of which some 3,000 were 70+. Currently a 6-wave longitudinal database (1987, 1990, 1993, 1996, 1999, & 2002) is in place; wave 7 began in 2006. Data Availability: Data from the first three waves of the National Survey of the Japanese Elderly are currently in the public domain and can be obtained from ICPSR. Additional data are being prepared for future public release. * Dates of Study: 1987-2006 * Study Features: Longitudinal, International * Sample Size: ** 1987: 2,200 ** 1990: 2,780 ** 1993: 2,780 ** 1996: ** 1999: 3,990 ** 2002: ** 2006: Links: * 1987 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06842 * 1990 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03407 * 1993 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/04145 * 1996 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/26621

Facebook

Twitter

Click to copy link

Link copied

Cite

data.cityofnewyork.us (2025). Mayor’s Office of Operations: Demographic Survey [Dataset]. https://catalog.data.gov/dataset/mayors-office-of-operations-demographic-survey

Mayor’s Office of Operations: Demographic Survey

Explore at:

Dataset updated

Jul 19, 2025

Dataset provided by

data.cityofnewyork.us

Description

Pursuant to Local Laws 126, 127, and 128 of 2016, certain demographic data is collected voluntarily and anonymously by persons voluntarily seeking social services. This data can be used by agencies and the public to better understand the demographic makeup of client populations and to better understand and serve residents of all backgrounds and identities. The data presented here has been collected through either electronic form or paper surveys offered at the point of application for services. These surveys are anonymous. Each record represents an anonymized demographic profile of an individual applicant for social services, disaggregated by response option, agency, and program. Response options include information regarding ancestry, race, primary and secondary languages, English proficiency, gender identity, and sexual orientation. Idiosyncrasies or Limitations: Note that while the dataset contains the total number of individuals who have identified their ancestry or languages spoke, because such data is collected anonymously, there may be instances of a single individual completing multiple voluntary surveys. Additionally, the survey being both voluntary and anonymous has advantages as well as disadvantages: it increases the likelihood of full and honest answers, but since it is not connected to the individual case, it does not directly inform delivery of services to the applicant. The paper and online versions of the survey ask the same questions but free-form text is handled differently. Free-form text fields are expected to be entered in English although the form is available in several languages. Surveys are presented in 11 languages. Paper Surveys 1. Are optional 2. Survey taker is expected to specify agency that provides service 2. Survey taker can skip or elect not to answer questions 3. Invalid/unreadable data may be entered for survey date or date may be skipped 4. OCRing of free-form tet fields may fail. 5. Analytical value of free-form text answers is unclear Online Survey 1. Are optional 2. Agency is defaulted based on the URL 3. Some questions must be answered 4. Date of survey is automated

Clear search

Close search

Google apps

Main menu

Mayor’s Office of Operations: Demographic Survey

AmeriCorps Members Demographic

Population by Tracts 2018

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

Demographic Performa

ZUMA Standard Demography (Time Series) - Dataset - B2FIND

Replication Package for ML-EUP Conversational Agent Study

undefined undefined: undefined | undefined (undefined)

American Community Survey

Dataset of depression and anxiety among the elderly derived from The...

Expert opinions of demographic rates of Argentine black and white tegus in...

Census of Population and Housing, 1980 [United States]: Public Use Microdata...

2021 Long Form Census - Ward Data

Replication Package for ML-EUP Conversational Agent Study

2020 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by...

Synthetic genomic data

Namibia Population and Housing Census 2011 - Namibia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Data appraisal

Random Forest (RF), Gradient Boosting (GB), and AdaBoost (AB) methods:...

2022 Economic Surveys: AB00MYNESD01A | Nonemployer Statistics by...

National Survey of the Japanese Elderly

Mayor’s Office of Operations: Demographic Survey