Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in La Cañada Flintridge, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/la-canada-flintridge-ca-median-household-income-by-household-size.jpeg" alt="La Cañada Flintridge, CA median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for La Cañada Flintridge median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Williams Bay, WI, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/williams-bay-wi-median-household-income-by-household-size.jpeg" alt="Williams Bay, WI median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Williams Bay median household income. You can refer the same here
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Real Median Personal Income in the United States (MEPAINUSA672N) from 1974 to 2023 about personal income, personal, median, income, real, and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wages in China increased to 120698 CNY/Year in 2023 from 114029 CNY/Year in 2022. This dataset provides - China Average Yearly Wages - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Greensboro. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.
Key observations: Insights from 2022
Based on our analysis ACS 2022 1-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Greensboro, the median income for all workers aged 15 years and older, regardless of work hours, was $37,291 for males and $26,937 for females.
These income figures indicate a substantial gender-based pay disparity, showcasing a gap of approximately 28% between the median incomes of males and females in Greensboro. With women, regardless of work hours, earning 72 cents to each dollar earned by men, this income disparity reveals a concerning trend toward wage inequality that demands attention in thecity of Greensboro.
- Full-time workers, aged 15 years and older: In Greensboro, among full-time, year-round workers aged 15 years and older, males earned a median income of $53,807, while females earned $41,696, leading to a 23% gender pay gap among full-time workers. This illustrates that women earn 77 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.Remarkably, across all roles, including non-full-time employment, women displayed a similar gender pay gap percentage. This indicates a consistent gender pay gap scenario across various employment types in Greensboro, showcasing a consistent income pattern irrespective of employment status.
https://i.neilsberg.com/ch/greensboro-nc-income-by-gender.jpeg" alt="Greensboro, NC gender based income disparity">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2022 1-Year Estimates. All incomes have been adjusting for inflation and are presented in 2022-inflation-adjusted dollars.
Gender classifications include:
Employment type classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Greensboro median household income by gender. You can refer the same here
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions
in Meta Kaggle. The file names match the ids in the KernelVersions
csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads
. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Loudoun County. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.
Key observations: Insights from 2022
Based on our analysis ACS 2022 1-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Loudoun County, the median income for all workers aged 15 years and older, regardless of work hours, was $96,408 for males and $50,183 for females.
These income figures highlight a substantial gender-based income gap in Loudoun County. Women, regardless of work hours, earn 52 cents for each dollar earned by men. This significant gender pay gap, approximately 48%, underscores concerning gender-based income inequality in the county of Loudoun County.
- Full-time workers, aged 15 years and older: In Loudoun County, among full-time, year-round workers aged 15 years and older, males earned a median income of $124,133, while females earned $87,582, leading to a 29% gender pay gap among full-time workers. This illustrates that women earn 71 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Loudoun County.
https://i.neilsberg.com/ch/loudoun-county-va-income-by-gender.jpeg" alt="Loudoun County, VA gender based income disparity">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2022 1-Year Estimates. All incomes have been adjusting for inflation and are presented in 2022-inflation-adjusted dollars.
Gender classifications include:
Employment type classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Loudoun County median household income by gender. You can refer the same here
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The main goal of this model is to help me create an app that count How much money does a picture has.
Descriptions of each class type
I don't seperate country base money and don't seperate front and back
EUR-1-cent dasdasd
EUR-2-cent
EUR-5-cent
EUR-10-cent
EUR-20-cent
EUR-50-cent
EUR-1-euro
EUR-2-euro
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains information about high school students and their actual and predicted performance on an exam. Most of the information, including some general information about high school students and their grade for an exam, was based on an already existing dataset, while the predicted exam performance was based on a human experiment. In this experiment, participants were shown short descriptions of the students (based on the information in the original data) and had to rank and grade according to their expected performance. Prior to this task some participants were exposed to some "Stereotype Activation", suggesting that boys perform less well in school than girls.
Based on this dataset (which is also available on kaggle), we extracted a number of student profiles that participants had to make grade predictions for. For more information about this dataset we refer to the corresponding kaggle page: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption
Note that we performed some preprocessing on the original data:
The original data consisted of two parts: the information about students following a Maths course and the information about students following a Portuguese course. Since in both datasets the same type of information was recorded, we merged both datasets and added a column "subject", to show which course each student belongs to
We excluded all data where G3 = 0 (i.e. the grade for the last exam = 0)
From original_data.csv we randomly sampled 856 students that participants in our study had to make grade predictions for.
index - this column corresponds to the indeces in the file "original_data.csv". Through these indices, it is possible to add columns from the original data to the dataset with the grade prediction
ParticipantID - the ID of the participant who made the performance predictions for the corresponding student. Predictions needed to be made for 856 students, and each participant made 8 predictions total. Thus there are 107 different participant IDs
name - to make the prediction task more engaging for participants, each of the 8 student profiles, that participants had to grade & rank was randomly matched to one of four boy/girl's names (depending on the sex of the student)
sex - the sex of each student, either female (F) or male (M). For benchmarking fair ML algorithms, this can be used as the sensitive attribute. We assume that in the fair version of the decision variable ("Pass"), no sex discrimination occurs. The biased versions of the variable ("Predicted Pass") are mostly discriminatory towards male students.
studytime - this variable is taken from the original dataset and denotes how long a student studied for their exam. In the original data this variable consisted of four levels (less than 2 hours vs. 2-5 hours vs. 5-10 hours vs. more than 10 hours). We binned the latter two levels together and encoded this column numerically from 1-3.
freetime - Originally, this variable ranged from 1 (very low) to 5 (very high). We binned this variable into three categories, where level 1 and 2 are binned, as well as level 4 and 5.
romantic - Binary variable, denoting whether the student is in a romantic relationship or not.
Walc - This variable shows how much alcohol each student consumes in the weekend. Originally it ranged from 1 to 5 (5 corresponding to the highest alcohol consumption), but we binned the last two levels together.
goout - This variable shows how often a student goes out in a week. Originally it ranged from 1 to 5 (5 corresponding to going out very often), but we binned the last two levels together.
Parents_edu - This variable was not present in the original dataset. Instead, the original dataset consisted of two variables "mum_edu" and "dad_edu". We obtained "Parents_edu" by taking the higher one of both. The variable consist of 4 levels, whereas 4 = highest level of education.
absences - This variable shows the number of absences per student. Originally it ranged from 0 - 93, but because large number of absences were infrequent we binned all absences of >=7 into one level.
reason - The reason for why a student chose to go to the school in question. The levels are close to home, school's reputation, school's curricular and other
G3 - The actual grade each student received for the final exam of the course, ranging from 0-20.
Pass - A binary variable showing whether G3 is a passing grade (i.e. >=10) or not.
Predicted Grade - The grade the student was predicted to receive in our experiment
Predicted Rank - In our ex...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 1 row and is filtered where the books is 101 great ways to sew a metre : look how much you can make with just one metre of fabric!. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data is used for a broadband mapping initiative conducted by the Washington State Broadband Office. This dataset provides global fixed broadband and mobile (cellular) network performance metrics in zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile. Measurements are filtered to results containing GPS-quality location accuracy. The data was processed and published to ArcGIS Living Atlas by Esri.AboutSpeedtest data is used today by commercial fixed and mobile network operators around the world to inform network buildout, improve global Internet quality, and increase Internet accessibility. Government regulators such as the United States Federal Communications Commission and the Malaysian Communications and Multimedia Commission use Speedtest data to hold telecommunications entities accountable and direct funds for rural and urban connectivity development. Ookla licenses data to NGOs and educational institutions to fulfill its mission: to help make the internet better, faster and more accessible for everyone. Ookla hopes to further this mission by distributing the data to make it easier for individuals and organizations to use it for the purposes of bridging the social and economic gaps between those with and without modern Internet access.DataHundreds of millions of Speedtests are taken on the Ookla platform each month. In order to create a manageable dataset, we aggregate raw data into tiles. The size of a data tile is defined as a function of "zoom level" (or "z"). At z=0, the size of a tile is the size of the whole world. At z=1, the tile is split in half vertically and horizontally, creating 4 tiles that cover the globe. This tile-splitting continues as zoom level increases, causing tiles to become exponentially smaller as we zoom into a given region. By this definition, tile sizes are actually some fraction of the width/height of Earth according to Web Mercator projection (EPSG:3857). As such, tile size varies slightly depending on latitude, but tile sizes can be estimated in meters.For the purposes of these layers, a zoom level of 16 (z=16) is used for the tiling. This equates to a tile that is approximately 610.8 meters by 610.8 meters at the equator (18 arcsecond blocks). The geometry of each tile is represented in WGS 84 (EPSG:4326) in the tile field.The data can be found at: https://github.com/teamookla/ookla-open-dataUpdate CadenceThe tile aggregates start in Q1 2019 and go through the most recent quarter. They will be updated shortly after the conclusion of the quarter.Esri ProcessingThis layer is a best available aggregation of the original Ookla dataset. This means that for each tile that data is available, the most recent data is used. So for instance, if data is available for a tile for Q2 2019 and for Q4 2020, the Q4 2020 data is awarded to the tile. The default visualization for the layer is the "broadband index". The broadband index is a bivariate index based on both the average download speed and the average upload speed. For Mobile, the score is indexed to a standard of 25 megabits per second (Mbps) download and 3 Mbps upload. A tile with average Speedtest results of 25/3 Mbps is awarded 100 points. Tiles with average speeds above 25/3 are shown in green, tiles with average speeds below this are shown in fuchsia. For Fixed, the score is indexed to a standard of 100 Mbps download and 3 Mbps upload. A tile with average Speedtest results of 100/20 Mbps is awarded 100 points. Tiles with average speeds above 100/20 are shown in green, tiles with average speeds below this are shown in fuchsia.Tile AttributesEach tile contains the following adjoining attributes:The year and the quarter that the tests were performed.The average download speed of all tests performed in the tile, represented in megabits per second.The average upload speed of all tests performed in the tile, represented in megabits per second.The average latency of all tests performed in the tile, represented in millisecondsThe number of tests taken in the tile.The number of unique devices contributing tests in the tile.The quadkey representing the tile.QuadkeysQuadkeys can act as a unique identifier for the tile. This can be useful for joining data spatially from multiple periods (quarters), creating coarser spatial aggregations without using geospatial functions, spatial indexing, partitioning, and an alternative for storing and deriving the tile geometry.LayersThere are two layers:Ookla_Mobile_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a cellular connection type (e.g. 4G LTE, 5G NR).Ookla_Fixed_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a non-cellular connection type (e.g. WiFi, ethernet).The layers are set to draw at scales 1:3,000,000 and larger.Time Period and update Frequency Layers are generated based on a quarter year of data (three months) and files will be updated and added on a quarterly basis. A /year=2020/quarter=1/ period, the first quarter of the year 2020, would include all data generated on or after 2020-01-01 and before 2020-04-01.
https://www.icpsr.umich.edu/web/ICPSR/studies/36498/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36498/terms
The Population Assessment of Tobacco and Health (PATH) Study began originally surveying 45,971 adult and youth respondents. The PATH Study was launched in 2011 to inform Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Unit (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.Dataset 0001 (DS0001) contains the data from the Master Linkage file. This file contains 14 variables and 67,276 cases. The file provides a master list of every person's unique identification number and what type of respondent they were for each wave. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This second replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 4 Cohort who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Public-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts.Dataset 1001 (DS1001) contains the data from the Wave 1 Adult Questionnaire. This data file contains 1,732 variables and 32,320 cases. Each of the cases represents a single, completed interview. Dataset 1002 (DS1002) contains the data from the Youth and Parent Questionnaire. This file contains 1,228 variables and 13,651 cases.Dataset 2001 (DS2001) contains the data from the Wave 2 Adult Questionnaire. This data file contains 2,197 variables and 28,362 cases. Of these cases, 26,447 also completed a Wave 1 Adult Questionnaire. The other 1,915 cases are "aged-up adults" having previously completed a Wave 1 Youth Questionnaire. Dataset 2002 (DS2002) contains the data from the Wave 2 Youth and Parent Questionnaire. This data file contains 1,389 variables and 12,172 cases. Of these cases, 10,081 also completed a Wave 1 Youth Questionnaire. The other 2,091 cases are "aged-up youth" having previously been sampled as "shadow youth." Dataset 3001 (DS3001) contains the data from the Wave 3 Adult Questionnaire. This data file contains 2,139 variables and 28,148 cases. Of these cases, 26,241 are continuing adults having completed a prior Adult Questionnaire. The other 1,907 cases are "aged-up adults" having previously completed a Youth Questionnaire. Dataset 3002 (DS3002) contains the data from t
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.
KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?
Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?
Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wages in Mexico decreased to 278.93 MXN/Day in May from 621.89 MXN/Day in April of 2025. This dataset provides - Mexico Average Daily Wages - actual values, historical data, forecast, chart, statistics, economic calendar and news.
How do I escalate an issue on Expedia?
To escalate an issue on Expedia, contact customer service directly at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Request to speak with a supervisor or file an official complaint. Keep your booking ID handy for faster support. 2. Where do I file a complaint against Expedia?
File a complaint by calling Expedia at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). You can also submit it via their Help Center online, but calling provides quicker escalation. 3. How do I request compensation on Expedia?
For compensation due to delays, cancellations, or service issues, call Expedia’s OTA support at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and explain the situation in detail. How do I ask a question on Expedia?
To ask a question on Expedia, visit the Help Center or call customer support directly at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). They provide answers regarding bookings, policies, payments, and more. For fast help, have your itinerary number ready when calling. 2. How do I escalate an issue on Expedia?
If you're unsatisfied with initial support, escalate the issue by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and request a supervisor. Explain your concern in detail. You can also use the feedback option in your Expedia account to formally escalate complaints. 3. Can I speak to someone at Expedia?
Yes, Expedia offers live support. Call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and follow the voice prompts to speak with a representative. Use keywords like “agent” or “speak to a person” during the call. 4. How do I dispute with Expedia?
To dispute a charge or booking, contact Expedia directly at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Provide your booking details and reason for the dispute. You may also follow up with your bank or credit card provider. 5. Where can I complain about Expedia?
Submit a complaint by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). You can also file through the Expedia Help Center. If unresolved, complaints may be filed with consumer protection platforms or BBB. 6. Does Expedia refund your money?
Yes, Expedia refunds are possible for eligible bookings. Policies vary by airline, hotel, or activity. For refund help or to check eligibility, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 7. How do I make a claim with Expedia?
To file a claim, such as for trip protection or lost service, contact Expedia at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). They’ll guide you through the claim process and documentation needed. 8. What are fully refundable terms on Expedia?
Fully refundable bookings allow cancellations for a full refund by a specified date. These options are clearly labeled. For clarity or changes, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 9. Does Expedia offer compensation?
Expedia may offer compensation for issues like cancellations or booking errors. To request it, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and explain your situation. 10. How do I sue Expedia for a refund?
Legal action should be a last resort. First, attempt resolution by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). If unresolved, consider small claims court or arbitration as outlined in Expedia's terms. 11. How much of a cut does Expedia take?
Expedia typically takes a 15–20% commission from hotels. Rates may vary. If you’re a partner or host, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) for detailed fee structures. 12. Can you break up payments on Expedia?
Yes, Expedia offers payment plans through third-party services like Affirm. At checkout, select installment payment options. For support, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 13. Does Expedia pay commission to travel agents?
Yes, Expedia offers commissions through its TAAP (Travel Agent Affiliate Program). Travel agents can register or inquire by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 14. Does Expedia give price adjustments?
Expedia may honor price adjustments under certain conditions, especially if a Price Match Guarantee applies. Call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) for claims. 15. What is Expedia cancellation plan?
Expedia offers free or flexible cancellation options depending on the booking type. Review cancellation terms before booking. For help canceling, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 16. How do I escalate an issue with Expedia?
To escalate an unresolved issue, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and ask for a supervisor. Be ready with supporting documents and your booking ID. 17. Is it hard to cancel with Expedia?
Cancelling is simple if the booking allows it. Go to "My Trips" or call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Non-refundable bookings may be harder to cancel without a penalty. 18. Can you dispute with Expedia?
Yes, you can dispute incorrect charges or services by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Provide clear reasons and any supporting documents for faster resolution. 4. How to file a dispute with Expedia?
To dispute charges or incorrect bookings, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Provide documentation such as screenshots or receipts. 5. Does Expedia actually refund money?
Yes, Expedia issues refunds based on fare rules and policies. To check eligibility or start a refund, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 6. How do I file a claim with Expedia?
File claims (for trip protection or travel issues) by calling +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and requesting the claim form or online submission link. 7. How do I get a full refund from Expedia?
A full refund depends on the fare and policy. For eligible bookings, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and ask for a full cancellation refund. 8. Can I take Expedia to court?
Yes, you can take legal action, but it's advised to first resolve issues via OTA numbers: +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). Try arbitration if applicable. 9. How do I receive money from Expedia?
If you’re due a refund or payout, contact Expedia support at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) to confirm payment status or method. 10. How do I speak to a real person at Expedia?
To speak to a live agent, dial +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) and follow the prompt for customer service. Say “agent” or “representative.” 11. How much commission does Expedia take from hotels?
Expedia typically takes 15–20% commission from hotels, but it varies. Hotel partners can call OTA support at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) for details. 12. Do you get your deposit back from Expedia?
Refundability of deposits depends on the hotel’s policy. Call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) to check if your booking includes refundable deposits. 13. What is the non-refundable policy on Expedia?
Non-refundable bookings can’t be canceled for a refund unless special circumstances apply. Call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) to check flexibility. 14. How long does Expedia take to process payments?
Refunds or payments typically take 7–10 business days. For updates, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026). 15. Is Expedia reliable?
Expedia is a reputable travel platform, but like any service, issues may arise. Contact support at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) if you face concerns. 16. Does Expedia give compensation?
Yes, Expedia may offer compensation for errors or disruptions. Contact them directly at +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) to make a request. 17. How do I dispute a charge on Expedia?
To dispute a charge, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026) with billing details and explain the issue. You can also contact your bank if needed. 18. What is the refundable option on Expedia?
Refundable bookings allow cancellation for a full refund. Always review the policy before booking. For changes or questions, call +(1) ➣(877) ➢ (567) ➤ (9375) or ☎+(1) ➣ (877) ➢ (398) ➤ (1026).
https://www.icpsr.umich.edu/web/ICPSR/studies/36231/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36231/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population (CNP) at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Unit (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the CNP at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the CNP at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the CNP at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Dataset 0002 (DS0002) contains the data from the State Design Data. This file contains 7 variables and 82,139 cases. The state identifier in the State Design file reflects the participant's state of residence at the time of selection and recruitment for the PATH Study. Dataset 1011 (DS1011) contains the data from the Wave 1 Adult Questionnaire. This data file contains 2,021 variables and 32,320 cases. Each of the cases represents a single, completed interview. Dataset 1012 (DS1012) contains the data from the Wave 1 Youth and Parent Questionnaire. This file contains 1,431 variables and 13,651 cases. Dataset 1411 (DS1411) contains the Wave 1 State Identifier data for Adults and has 5 variables and 32,320 cases. Dataset 1412 (DS1412) contains the Wave 1 State Identifier data for Youth (and Parents) and has 5 variables and 13,651 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state Federal Information Processing System (FIPS), state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 1, which is also their state of residence at the time of recruitment. Dataset 1611 (DS1611) contains the Tobacco Universal Product Code (UPC) data from Wave 1. This data file contains 32 variables and 8,601 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 1. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SynQA is a Reading Comprehension dataset created in the work "Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation" (https://aclanthology.org/2021.emnlp-main.696/). It consists of 314,811 synthetically generated questions on the passages in the SQuAD v1.1 (https://arxiv.org/abs/1606.05250) training set.
In this work, we use a synthetic adversarial data generation to make QA models more robust to human adversaries. We develop a data generation pipeline that selects source passages, identifies candidate answers, generates questions, then finally filters or re-labels them to improve quality. Using this approach, we amplify a smaller human-written adversarial dataset to a much larger set of synthetic question-answer pairs. By incorporating our synthetic data, we improve the state-of-the-art on the AdversarialQA (https://adversarialqa.github.io/) dataset by 3.7F1 and improve model generalisation on nine of the twelve MRQA datasets. We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8.8% of the time on average, compared to 17.6% for a model trained without synthetic data.
For full details on how the dataset was created, kindly refer to the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in La Cañada Flintridge, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/la-canada-flintridge-ca-median-household-income-by-household-size.jpeg" alt="La Cañada Flintridge, CA median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for La Cañada Flintridge median household income. You can refer the same here