100+ datasets found

f
Examples of studies that used presence-absence data to compute Jaccard’s...
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan (2023). Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed. [Dataset]. http://doi.org/10.1371/journal.pone.0187132.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0187132.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.
Data from: Nursing Home Compare
catalog.data.gov
data.va.gov
+2more
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). Nursing Home Compare [Dataset]. https://catalog.data.gov/dataset/nursing-home-compare-ed7b0
Explore at:
Dataset updated
May 1, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
Nursing Home Compare has detailed information about every Medicare and Medicaid nursing home in the country. A nursing home is a place for people who can’t be cared for at home and need 24-hour nursing care. These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.
f
Data from: Robust Leave-One-Out Cross-Validation for High-Dimensional...
tandf.figshare.com
pdf
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Alessandro Silva; Giacomo Zanella (2023). Robust Leave-One-Out Cross-Validation for High-Dimensional Bayesian Models [Dataset]. http://doi.org/10.6084/m9.figshare.24167959.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24167959.v2
Dataset updated
Nov 9, 2023
Dataset provided by
Taylor & Francis
Authors
Luca Alessandro Silva; Giacomo Zanella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Leave-one-out cross-validation (LOO-CV) is a popular method for estimating out-of-sample predictive accuracy. However, computing LOO-CV criteria can be computationally expensive due to the need to fit the model multiple times. In the Bayesian context, importance sampling provides a possible solution but classical approaches can easily produce estimators whose asymptotic variance is infinite, making them potentially unreliable. Here we propose and analyze a novel mixture estimator to compute Bayesian LOO-CV criteria. Our method retains the simplicity and computational convenience of classical approaches, while guaranteeing finite asymptotic variance of the resulting estimators. Both theoretical and numerical results are provided to illustrate the improved robustness and efficiency. The computational benefits are particularly significant in high-dimensional problems, allowing to perform Bayesian LOO-CV for a broader range of models, and datasets with highly influential observations. The proposed methodology is easily implementable in standard probabilistic programming software and has a computational cost roughly equivalent to fitting the original model once. Supplementary materials for this article are available online.
ACS 1-Year Comparison Profiles
datasets.ai
catalog.data.gov
2
Updated Sep 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Commerce (2024). ACS 1-Year Comparison Profiles [Dataset]. https://datasets.ai/datasets/acs-1-year-comparison-profiles-ec468
Explore at:
2Available download formats
Dataset updated
Sep 19, 2024
Dataset provided by
United States Department of Commercehttp://www.commerce.gov/
Authors
Department of Commerce
Description
The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex. Summary files, Subject tables, Data profiles, and Comparison profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more. Comparison profiles are similar to data profiles but also include comparisons with past-year data. The current year data are compared with each of the last four years of data and include statistical significance testing. There are over 1,000 variables in this dataset.
f
Data from: S8 Fig -
plos.figshare.com
zip
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289211.s009
Dataset updated
Aug 3, 2023
Dataset provided by
PLOS ONE
Authors
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.
a
Data from: A semi-simulated EEG/EOG dataset for the comparison of EOG...
researchdata.aston.ac.uk
data.mendeley.com
Updated May 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manousos Klados; Panagiotis Bamidis (2016). A semi-simulated EEG/EOG dataset for the comparison of EOG artifact rejection techniques [Dataset]. http://doi.org/10.17632/wb6yvr725d.3
Explore at:
Unique identifier
https://doi.org/10.17632/wb6yvr725d.3
Dataset updated
May 20, 2016
Authors
Manousos Klados; Panagiotis Bamidis
Description
This work presents a semi-simulated EEG dataset, where artifact-free EEG signals are manually contaminated with ocular artifacts following the model proposed by [1]. The significant part of this dataset is that it contains the pre-contamination EEG signals, so the brain signals underlying the EOG artifacts are known and thus the performance of every artifact rejection technique can be objectively assessed. The main differences of the proposed dataset compared to others (p.e. see [2,3]) is that it is focused only on EOG artifacts, using a realistic model for the contamination of artifact-free EEGs and not a random procedure.

[1] T. Elbert, W. Lutzenberger, B. Rockstroh, N. Birbaumer, Removal of ocular artifacts from the EEG--a biophysical approach to the EOG., Electroencephalogr. Clin. Neurophysiol. 60 (1985) 455–63. http://www.ncbi.nlm.nih.gov/pubmed/2580697 (accessed April 10, 2013).

[2] X. Yong, M. Fatourechi, R.K. Ward, G.E. Birch, Automatic artefact removal in a self-paced hybrid brain- computer interface system, J. Neuroeng. Rehabil. 9 (2012) 50. doi:10.1186/1743-0003-9-50.

[3] A.K. Abdullah, C.Z. Zhang, A.A.A. Abdullah, S. Lian, Automatic Extraction System for Common Artifacts in EEG Signals Based on Evolutionary Stone’s BSS Algorithm, Math. Probl. Eng. 2014 (2014) 1–25. doi:10.1155/2014/324750.
N
Stanford, IL annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Stanford, IL annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a539041b-f4ce-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Stanford, Illinois
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Stanford. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Stanford, the median income for all workers aged 15 years and older, regardless of work hours, was $55,625 for males and $36,902 for females.
These income figures highlight a substantial gender-based income gap in Stanford. Women, regardless of work hours, earn 66 cents for each dollar earned by men. This significant gender pay gap, approximately 34%, underscores concerning gender-based income inequality in the village of Stanford.
- Full-time workers, aged 15 years and older: In Stanford, among full-time, year-round workers aged 15 years and older, males earned a median income of $60,887, while females earned $46,389, leading to a 24% gender pay gap among full-time workers. This illustrates that women earn 76 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Stanford.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Stanford median household income by race. You can refer the same here
N
Rumford, Maine annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Rumford, Maine annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a5349471-f4ce-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Maine, Rumford
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Rumford town. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Rumford town, the median income for all workers aged 15 years and older, regardless of work hours, was $34,124 for males and $22,643 for females.
These income figures highlight a substantial gender-based income gap in Rumford town. Women, regardless of work hours, earn 66 cents for each dollar earned by men. This significant gender pay gap, approximately 34%, underscores concerning gender-based income inequality in the town of Rumford town.
- Full-time workers, aged 15 years and older: In Rumford town, among full-time, year-round workers aged 15 years and older, males earned a median income of $60,964, while females earned $43,807, leading to a 28% gender pay gap among full-time workers. This illustrates that women earn 72 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Rumford town.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Rumford town median household income by race. You can refer the same here
LLMWorldOfWords/LWOW: First release
zenodo.org
zip
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katherine Elizabeth Abramski; Katherine Elizabeth Abramski; Riccardo Improta; Riccardo Improta; Giulio Rossetti; Giulio Rossetti; Massimo Stella; Massimo Stella (2025). LLMWorldOfWords/LWOW: First release [Dataset]. http://doi.org/10.5281/zenodo.15222294
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15222294
Dataset updated
Apr 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katherine Elizabeth Abramski; Katherine Elizabeth Abramski; Riccardo Improta; Riccardo Improta; Giulio Rossetti; Giulio Rossetti; Massimo Stella; Massimo Stella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 15, 2025
Area covered
Lviv
Description
The "LLM World of Words" (LWOW) [1] is a collection of datasets of English free association norms generated by various large language models (LLMs). Currently, the collection consists of datasets generated by Mistral, LLaMA3, and Claude Haiku. The datasets are modeled after the "Small World of Words" (SWOW) (https://smallworldofwords.org/en/project/) [2] English free association norms, generated by humans, consisting of over 12,000 cue words and over 3 million responses. The purpose of the LWOW datasets is to provide a way to investigate various aspects of the semantic memory of LLMs using an approach that has been applied extensively for investigating the semantic memory of humans. These datasets, together with the SWOW dataset, can be used to gain insights about similarities and differences in the language structures possessed by humans and LLMs.

What are free associations?

Free associations are implicit mental connections between words or concepts. They are typically accessed by presenting humans (or AI agents) with a cue word and then asking them to respond with the first words that come to mind. The responses represent implicit associations that connect different concepts in the mind, reflecting the semantic representations that underly patterns of thought, memory, and language. For example, given the cue word "woman", a common free association response might be "man", reflecting the associative mental relation between these two concepts.

How can they be used?

Free associations have been extensively used in cognitive psychology and linguistics as a tool for studying language and cognitive information processing. They provide a way for researchers to understand how conceptual knowledge is organized and accessed in the mind. Free associations are often used to built network models of semantic memory by connecting cue words to their responses. When thousands of cues and responses are connected in this way, the result is a complex network model that represents the complex organization of semantic knowledge. Such models enable the investigation of complex cognitive processes that take place within semantic memory, and can be used to study a variety of cognitive phenomena such as language learning, creativity, personality traits, and cognitive biases.

Validation of the datasets with semantic priming

The LWOW datasets were validated using data from the Semantic Priming Project (https://www.montana.edu/attmemlab/spp.html) [3], which implements a lexical decision task (LDT) to study semantic priming. The semantic priming effect is the cognitive phenomenon that a target word (e.g. nurse) is more easily recognized when it is prompted by a related prime word (e.g. doctor) compared to an unrelated prime word (e.g. doctrine). We simulated the semantic priming effect within network models of semantic memory built from both the LWOW and the SWOW free association norms by implementing spreading activation processes within the networks [4]. We found that the final activation levels of prime-target pairs correlated significantly with reaction time data for the same prime-target pairs from the LDT. Specifically, the activation of a target node (e.g. nurse) is higher when a related prime node (e.g. doctor) is activated compared to an unrelated prime node (e.g. doctrine). These results demonstrate how the LWOW datasets can be used for investigating cognitive and linguistic phenomena in LLMs, demonstrating the validity of the datasets.

Investigating gender biases

To demonstrate how this dataset can be used to investigate gender biases in LLMs compared to humans, we conducted an analysis using network models of semantic memory built from both the LWOW and the SWOW free association norms. We applied a methodology that simulates semantic priming within the networks to measure the strength of association between pairs of concepts, for example, "woman" and "forecful" vs. "man" and "forceful". We applied this methodology using a set of female-related and male-related primes, and a set of female-related and male-related targets. This analysis revealed that certain adjectives like "forceful" and "strong" are more strongly associated with certain genders, shedding light on the types of stereotypical gender biases that both humans and LLMs possess.

Technical notes

The free associations were generated (either via API or locally, depending on the LLM) by providing each LLM with a set of cue words and the following prompt: "You will be provided with an input word. Write the first 3 words you associate to it separated by a comma." This prompt was repeated 100 times for each cue word, resulting in a dataset of 11,545 unique cues words and 3,463,500 total responses for each LLM.

How to access and use the datasets

The LWOW datasets for Mistral, Llama3, and Haiku can be found in the LWOW_datasets folder, which contains two subfolders. The .csv files of the processed cues and responses can be found in the processed_datasets folder while the .csv files of the edge lists of the semantic networks constructed from the datasets can be found in the graphs/edge_lists folder.

Since the LWOW datasets are intended to be used in comparison to humans, we have further processed the original SWOW dataset to create a Human dataset that is aligned with the processing that we applied to the LWOW datasets. While this human dataset is not included in this repository due to the license of the original SWOW dataset, it can be easily reproduced by running the code provided in the reproducibility folder. We highly encourage you to generate this dataset as it enabales a direct comparison between humans and LLMs. The Human dataset can be generated with the following steps:

Go to the SWOW research page (https://smallworldofwords.org/en/project/research) [2] and download the English processed data (SWOW-EN18). Save this .csv file with the name "SWOW-EN.R100.csv" in the reproducibility/data/original_datasets folder.

Run the python file FA_data_Cleaning.py saved in the reproducibility folder. This will generate a .csv of the processed Human dataset, which will be saved in the reproducibility/data/processed_datasets folder. Note that this python script will also regenerate the .csv files of the processed LWOW datasets (the same that can be found in the LWOW_datasets/processed_datasets folder).

Run the python file FA_build_Networks.py saved in the reproducibility folder. This will generate a .csv of the edge list of the semantic network constructed from the Human dataset, which will be saved in the reproducibility/data/graphs/edge_lists folder. Note that this python script will also regenerate the .csv files of the same edges lists of the LLM networks (the same that can be found in the LWOW_datasets/graphs/edge_lists folder). This python script will also produce igraph versions of all the semantic networks.

How to reproduce the data and analyses

To reproduce the analyses, first the required external files need to be downloaded:

Go to the SWOW research page (https://smallworldofwords.org/en/project/research) [2] and download the English data SWOW-EN18. Save this .csv file with the name "SWOW-EN.R100.csv" in the reproducibility/data/original_datasets folder.

Go to the Semantic Priming Project (https://www.montana.edu/attmemlab/spp.html) [3] and download the LDT Priming Data. Save this .csv file with the name "primingLDT_data.csv" in the reproducibility/data/LDT_analyses folder.

Once the files are saved in the correct folders, follow the instructions in each script, which can be found in the reproducibility folder. The scripts should be run in the following order:

FA_data_Generation.py: generates the raw LLM datasets

FA_data_Cleaning.py: processes the original SWOW dataset and the raw LLM datasets

FA_build_Networks.py: builds the semantic networks from the datasets

FA_analyses_LDT_Gender.py and FA_spreadr.r: implements spreading activation processes within the networks in order to validate the datasets and investigate gender biases

Do you want to know more? Read the Preprint!

Abramski, K., et al. (2024). The "LLM World of Words" English free association norms generated by large language models (https://arxiv.org/abs/2412.01330)

Funding & Legal

SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021;

EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).

The HumaneAI-Net project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026.

COGNOSCO grant funded by Università di Trento (Grant ID: PS 22_27).

For speaking requests and enquiries, please contact:

Katherine Abramski : katherine.abramski@phd.unipi.it

Giulio Rossetti : giulio.rossetti@isti.cnr.it

Massimo Stella : massimo.stella-1@unitn.it

References

[1] Abramski, K., et al. (2024). The" LLM World of Words" English free association norms generated
Dataset from A Phase 4 Comparison of Duloxetine Dosing Strategies in the...
data.niaid.nih.gov
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Call 1-877-CTLILLY (1-877-285-4559) or 1-317-615-4559 Mon-Fri 9AM - 5PM Eastern time (UTC/GMT - 5hours, EST) (2025). Dataset from A Phase 4 Comparison of Duloxetine Dosing Strategies in the Treatment of Korean Patients With Major Depressive Disorder [Dataset]. http://doi.org/10.25934/00004464
Explore at:
Unique identifier
https://doi.org/10.25934/00004464
Dataset updated
Feb 22, 2025
Dataset provided by
Eli Lilly and Companyhttps://lilly.com/
Authors
Call 1-877-CTLILLY (1-877-285-4559) or 1-317-615-4559 Mon-Fri 9AM - 5PM Eastern time (UTC/GMT - 5hours, EST)
Area covered
Republic of, Korea
Variables measured
Adverse Event, Clinical Global Impression, Gastrointestinal Sensation - Finding, Hamilton Rating Scale For Depression, Patient Global Impression of Improvement
Description
The purpose of this study is to assess nausea severity in response to four different drug dosing strategies of Duloxetine (30 mg with food, 60 mg with food, 30 mg without food, and 60 mg without food) in Korean patients with major depressive disorder (MDD).
N
Sutherlin, OR annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Sutherlin, OR annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/sutherlin-or-income-by-gender/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sutherlin
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Sutherlin. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Sutherlin, the median income for all workers aged 15 years and older, regardless of work hours, was $42,190 for males and $25,678 for females.
These income figures highlight a substantial gender-based income gap in Sutherlin. Women, regardless of work hours, earn 61 cents for each dollar earned by men. This significant gender pay gap, approximately 39%, underscores concerning gender-based income inequality in the city of Sutherlin.
- Full-time workers, aged 15 years and older: In Sutherlin, among full-time, year-round workers aged 15 years and older, males earned a median income of $58,735, while females earned $41,581, leading to a 29% gender pay gap among full-time workers. This illustrates that women earn 71 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Sutherlin.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Sutherlin median household income by race. You can refer the same here
N
Income Distribution by Quintile: Mean Household Income in Great Neck Plaza,...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Great Neck Plaza, NY [Dataset]. https://www.neilsberg.com/research/datasets/949af105-7479-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Great Neck Plaza, New York
Variables measured
Income Level, Mean Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the mean household income for each of the five quintiles in Great Neck Plaza, NY, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

Key observations

Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 15,044, while the mean income for the highest quintile (20% of households with the highest income) is 344,968. This indicates that the top earners earn 23 times compared to the lowest earners.

*Top 5%: * The mean household income for the wealthiest population (top 5%) is 567,477, which is 164.50% higher compared to the highest quintile, and 3772.12% higher compared to the lowest quintile.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Income Levels:

Lowest Quintile

Second Quintile

Third Quintile

Fourth Quintile

Highest Quintile

Top 5 Percent

Variables / Data Columns

Income Level: This column showcases the income levels (As mentioned above).

Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Great Neck Plaza median household income. You can refer the same here
f
Presentation of the statistic indicators of our model for the 187 images of...
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Crisosto; Andreas Voskrebenzev; Marcel Gutberlet; Filip Klimeš; Till F. Kaireit; Gesa Pöhler; Tawfik Moher; Lea Behrendt; Robin Müller; Maximilian Zubke; Frank Wacker; Jens Vogel-Claussen (2023). Presentation of the statistic indicators of our model for the 187 images of the test dataset without consolidations and the 38 images of the test dataset with consolidations; the mean Sørensen-Dice similarity (SDC ± standard deviation), p-values distribution of the SD and the mean Hausdorff (HD) distance coefficient with the corresponding p-values distribution of the HD. [Dataset]. http://doi.org/10.1371/journal.pone.0285378.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285378.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Cristian Crisosto; Andreas Voskrebenzev; Marcel Gutberlet; Filip Klimeš; Till F. Kaireit; Gesa Pöhler; Tawfik Moher; Lea Behrendt; Robin Müller; Maximilian Zubke; Frank Wacker; Jens Vogel-Claussen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Presentation of the statistic indicators of our model for the 187 images of the test dataset without consolidations and the 38 images of the test dataset with consolidations; the mean Sørensen-Dice similarity (SDC ± standard deviation), p-values distribution of the SD and the mean Hausdorff (HD) distance coefficient with the corresponding p-values distribution of the HD.
Data from: Variable KOC and Poor-Quality Data Sources Cause High Discrepancy...
acs.figshare.com
xlsx
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fu Liu; Fan Fan; Qingmiao Yu; Hongqiang Ren; Jinju Geng (2025). Variable KOC and Poor-Quality Data Sources Cause High Discrepancy in Current Mobility Assessment of Organic Substances [Dataset]. http://doi.org/10.1021/acsestwater.4c00731.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acsestwater.4c00731.s001
Dataset updated
Jan 10, 2025
Dataset provided by
ACS Publications
Authors
Fu Liu; Fan Fan; Qingmiao Yu; Hongqiang Ren; Jinju Geng
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The widespread distribution of persistent, mobile, and toxic organic chemicals (PMT) in aquatic environments poses a threat to water resources. Current mobility assessments rely on the organic carbon normalized adsorption coefficient (KOC), but it is sometimes highly variable with sorptive phase (soil/sediment) properties. There is a common oversight that this variability causes assessment discrepancies. Herein, this variability was quantitatively evaluated based on compiled experimental KOC data sets, which were obtained under OECD guidelines. The results show that both the average discrepancy rate and relative difference rate are nearly half of those of the substances among recent reports. The underlying reasons are high KOC variability and poor-quality assessment data sources which fail to capture this variability. The variation in KOC values for one-third of the charged organic compounds is more than 1 order of magnitude, around twice higher than that of neutral organic compounds. The KOC values from common integrated databases or available quantitative structure–property relationships all have almost orders of magnitude differences compared with data sets, especially for charged compounds. The insights presented here have significant value in the future development of a proper mobility assessment.
N
Swansea, IL annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Swansea, IL annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a53a69ef-f4ce-11ef-8577-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Swansea, Illinois
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Swansea. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Swansea, the median income for all workers aged 15 years and older, regardless of work hours, was $48,750 for males and $30,417 for females.
These income figures highlight a substantial gender-based income gap in Swansea. Women, regardless of work hours, earn 62 cents for each dollar earned by men. This significant gender pay gap, approximately 38%, underscores concerning gender-based income inequality in the village of Swansea.
- Full-time workers, aged 15 years and older: In Swansea, among full-time, year-round workers aged 15 years and older, males earned a median income of $76,189, while females earned $59,162, leading to a 22% gender pay gap among full-time workers. This illustrates that women earn 78 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Swansea.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Swansea median household income by race. You can refer the same here
f
Data from: S1 Dataset -
plos.figshare.com
zip
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenwei Li; Shihai Zhang; Chongnian Qu; Zimiao Zhang; Feng Sun (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0304819.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304819.s001
Dataset updated
Jun 21, 2024
Dataset provided by
PLOS ONE
Authors
Zhenwei Li; Shihai Zhang; Chongnian Qu; Zimiao Zhang; Feng Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Solar cells are playing a significant role in aerospace equipment. In view of the surface defect characteristics in the manufacturing process of solar cells, the common surface defects are divided into three categories, which include difficult-detecting defects (mismatch), general defects (bubble, glass-crack and cell-crack) and easy-detecting defects (glass-upside-down). Corresponding to different types of defects, the deep learning model with different optimization methods and a classification detection method based on multi-models fusion are proposed in the paper. In the proposed model, in order to solve the mismatch problem between the default anchor boxes size of YOLOv5s model and the extreme scale of the battery mismatch defect label boxes, the K-means algorithm was adopted to re-cluster the dedicated anchor boxes for the mismatch defect label boxes. In order to improve the comprehensive detection accuracy of YOLOv5s model for the general defects, the YOLOv5s model was also improved by the methods of image preprocessing, anchor box improving and detection head replacing. In order to ensure the recognition accuracy and improve the detection speed for easy-detecting defects, the lightweight classification network MobileNetV2 was also used to classify the cells with glass-upside-down defects. The experimental results show that the proposed optimization model and classification detection method can significantly improve the defect detection precision. Respectively, the detection precision for mismatch, bubble, glass-crack and cell-crack defects are up to 95.64%, 91.8%, 93.1% and 98.0%. By using lightweight model to train the glass-upside-down defect dataset, the average classification accuracy reaches 100% and the detection speed reaches 13.29 frames per second. The comparison experiments show that the proposed model has a great improvement in detection accuracy compared with the original model, and the defect detection speed of lightweight classification network is improved more obviously, which confirms the effectiveness of the proposed optimization model and the multi-defect classification detection method for solar cells defect detection.
N
Sinton, TX annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Sinton, TX annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a5372e97-f4ce-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Texas, Sinton
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Sinton. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Sinton, the median income for all workers aged 15 years and older, regardless of work hours, was $33,381 for males and $17,755 for females.
These income figures highlight a substantial gender-based income gap in Sinton. Women, regardless of work hours, earn 53 cents for each dollar earned by men. This significant gender pay gap, approximately 47%, underscores concerning gender-based income inequality in the city of Sinton.
- Full-time workers, aged 15 years and older: In Sinton, among full-time, year-round workers aged 15 years and older, males earned a median income of $45,032, while females earned $34,583, leading to a 23% gender pay gap among full-time workers. This illustrates that women earn 77 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Sinton.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Sinton median household income by race. You can refer the same here
f
Comparison of classifier performance across two data sets.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant (2023). Comparison of classifier performance across two data sets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000093.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1000093.t003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS Computational Biology
Authors
Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The top table shows the average classifier performance for cross-validation on the 9-locus public STR data. The bottom table is the performance for the same test, but on a 9-locus subset of our ground-truth training data. While overall performance is lower than the 15-locus cross-validation test on our ground-truth data (Table 1), the two data sets perform similarly here, indicating that increasing the number of markers in the data set can significantly improve performance.
f
Annotation times for the external testing dataset.
plos.figshare.com
xls
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrik Mustonen; Antti Isosalo; Minna Nortunen; Mika Nevalainen; Miika T. Nieminen; Heikki Huhta (2024). Annotation times for the external testing dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0313126.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313126.t004
Dataset updated
Dec 3, 2024
Dataset provided by
PLOS ONE
Authors
Henrik Mustonen; Antti Isosalo; Minna Nortunen; Mika Nevalainen; Miika T. Nieminen; Heikki Huhta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotation times for the external testing dataset.
N
Crossett, AR annual median income by work experience and sex dataset: Aged...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Crossett, AR annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a50e0628-f4ce-11ef-8577-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Crossett, Arkansas
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Crossett. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Crossett, the median income for all workers aged 15 years and older, regardless of work hours, was $31,618 for males and $19,701 for females.
These income figures highlight a substantial gender-based income gap in Crossett. Women, regardless of work hours, earn 62 cents for each dollar earned by men. This significant gender pay gap, approximately 38%, underscores concerning gender-based income inequality in the city of Crossett.
- Full-time workers, aged 15 years and older: In Crossett, among full-time, year-round workers aged 15 years and older, males earned a median income of $54,148, while females earned $39,318, leading to a 27% gender pay gap among full-time workers. This illustrates that women earn 73 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Crossett.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Crossett median household income by race. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan (2023). Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed. [Dataset]. http://doi.org/10.1371/journal.pone.0187132.t002

Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0187132.t002

Dataset updated

Jun 4, 2023

Dataset provided by

PLOS ONE

Authors

Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.

Clear search

Close search

Google apps

Main menu

Examples of studies that used presence-absence data to compute Jaccard’s...

Data from: Nursing Home Compare

Data from: Robust Leave-One-Out Cross-Validation for High-Dimensional...

ACS 1-Year Comparison Profiles

Data from: S8 Fig -

Data from: A semi-simulated EEG/EOG dataset for the comparison of EOG...

Stanford, IL annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research

Rumford, Maine annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research

LLMWorldOfWords/LWOW: First release

What are free associations?

How can they be used?

Validation of the datasets with semantic priming

Investigating gender biases

Technical notes

How to access and use the datasets

How to reproduce the data and analyses

Do you want to know more? Read the Preprint!

Funding & Legal

References

Dataset from A Phase 4 Comparison of Duloxetine Dosing Strategies in the...

Sutherlin, OR annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research

Income Distribution by Quintile: Mean Household Income in Great Neck Plaza,...

About this dataset

Content

Inspiration

Recommended for further research

Presentation of the statistic indicators of our model for the 187 images of...

Data from: Variable KOC and Poor-Quality Data Sources Cause High Discrepancy...

Swansea, IL annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research

Data from: S1 Dataset -

Sinton, TX annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research

Comparison of classifier performance across two data sets.

Annotation times for the external testing dataset.

Crossett, AR annual median income by work experience and sex dataset: Aged...

About this dataset

Content

Inspiration

Recommended for further research