100+ datasets found
  1. f

    Examples of studies that used presence-absence data to compute Jaccard’s...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan (2023). Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed. [Dataset]. http://doi.org/10.1371/journal.pone.0187132.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.

  2. Data from: Nursing Home Compare

    • catalog.data.gov
    • data.va.gov
    • +2more
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2021). Nursing Home Compare [Dataset]. https://catalog.data.gov/dataset/nursing-home-compare-ed7b0
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    Nursing Home Compare has detailed information about every Medicare and Medicaid nursing home in the country. A nursing home is a place for people who can’t be cared for at home and need 24-hour nursing care. These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.

  3. f

    Data from: Robust Leave-One-Out Cross-Validation for High-Dimensional...

    • tandf.figshare.com
    pdf
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Alessandro Silva; Giacomo Zanella (2023). Robust Leave-One-Out Cross-Validation for High-Dimensional Bayesian Models [Dataset]. http://doi.org/10.6084/m9.figshare.24167959.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Luca Alessandro Silva; Giacomo Zanella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Leave-one-out cross-validation (LOO-CV) is a popular method for estimating out-of-sample predictive accuracy. However, computing LOO-CV criteria can be computationally expensive due to the need to fit the model multiple times. In the Bayesian context, importance sampling provides a possible solution but classical approaches can easily produce estimators whose asymptotic variance is infinite, making them potentially unreliable. Here we propose and analyze a novel mixture estimator to compute Bayesian LOO-CV criteria. Our method retains the simplicity and computational convenience of classical approaches, while guaranteeing finite asymptotic variance of the resulting estimators. Both theoretical and numerical results are provided to illustrate the improved robustness and efficiency. The computational benefits are particularly significant in high-dimensional problems, allowing to perform Bayesian LOO-CV for a broader range of models, and datasets with highly influential observations. The proposed methodology is easily implementable in standard probabilistic programming software and has a computational cost roughly equivalent to fitting the original model once. Supplementary materials for this article are available online.

  4. ACS 1-Year Comparison Profiles

    • datasets.ai
    • catalog.data.gov
    2
    Updated Sep 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Commerce (2024). ACS 1-Year Comparison Profiles [Dataset]. https://datasets.ai/datasets/acs-1-year-comparison-profiles-ec468
    Explore at:
    2Available download formats
    Dataset updated
    Sep 19, 2024
    Dataset provided by
    United States Department of Commercehttp://www.commerce.gov/
    Authors
    Department of Commerce
    Description

    The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex. Summary files, Subject tables, Data profiles, and Comparison profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more. Comparison profiles are similar to data profiles but also include comparisons with past-year data. The current year data are compared with each of the last four years of data and include statistical significance testing. There are over 1,000 variables in this dataset.

  5. f

    Data from: S8 Fig -

    • plos.figshare.com
    zip
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.

  6. a

    Data from: A semi-simulated EEG/EOG dataset for the comparison of EOG...

    • researchdata.aston.ac.uk
    • data.mendeley.com
    Updated May 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manousos Klados; Panagiotis Bamidis (2016). A semi-simulated EEG/EOG dataset for the comparison of EOG artifact rejection techniques [Dataset]. http://doi.org/10.17632/wb6yvr725d.3
    Explore at:
    Dataset updated
    May 20, 2016
    Authors
    Manousos Klados; Panagiotis Bamidis
    Description

    This work presents a semi-simulated EEG dataset, where artifact-free EEG signals are manually contaminated with ocular artifacts following the model proposed by [1]. The significant part of this dataset is that it contains the pre-contamination EEG signals, so the brain signals underlying the EOG artifacts are known and thus the performance of every artifact rejection technique can be objectively assessed. The main differences of the proposed dataset compared to others (p.e. see [2,3]) is that it is focused only on EOG artifacts, using a realistic model for the contamination of artifact-free EEGs and not a random procedure.

    [1] T. Elbert, W. Lutzenberger, B. Rockstroh, N. Birbaumer, Removal of ocular artifacts from the EEG--a biophysical approach to the EOG., Electroencephalogr. Clin. Neurophysiol. 60 (1985) 455–63. http://www.ncbi.nlm.nih.gov/pubmed/2580697 (accessed April 10, 2013).

    [2] X. Yong, M. Fatourechi, R.K. Ward, G.E. Birch, Automatic artefact removal in a self-paced hybrid brain- computer interface system, J. Neuroeng. Rehabil. 9 (2012) 50. doi:10.1186/1743-0003-9-50.

    [3] A.K. Abdullah, C.Z. Zhang, A.A.A. Abdullah, S. Lian, Automatic Extraction System for Common Artifacts in EEG Signals Based on Evolutionary Stone’s BSS Algorithm, Math. Probl. Eng. 2014 (2014) 1–25. doi:10.1155/2014/324750.

  7. N

    Stanford, IL annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Stanford, IL annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a539041b-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Stanford, Illinois
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Stanford. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Stanford, the median income for all workers aged 15 years and older, regardless of work hours, was $55,625 for males and $36,902 for females.

    These income figures highlight a substantial gender-based income gap in Stanford. Women, regardless of work hours, earn 66 cents for each dollar earned by men. This significant gender pay gap, approximately 34%, underscores concerning gender-based income inequality in the village of Stanford.

    - Full-time workers, aged 15 years and older: In Stanford, among full-time, year-round workers aged 15 years and older, males earned a median income of $60,887, while females earned $46,389, leading to a 24% gender pay gap among full-time workers. This illustrates that women earn 76 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Stanford.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Stanford median household income by race. You can refer the same here

  8. N

    Rumford, Maine annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Rumford, Maine annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a5349471-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Maine, Rumford
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Rumford town. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Rumford town, the median income for all workers aged 15 years and older, regardless of work hours, was $34,124 for males and $22,643 for females.

    These income figures highlight a substantial gender-based income gap in Rumford town. Women, regardless of work hours, earn 66 cents for each dollar earned by men. This significant gender pay gap, approximately 34%, underscores concerning gender-based income inequality in the town of Rumford town.

    - Full-time workers, aged 15 years and older: In Rumford town, among full-time, year-round workers aged 15 years and older, males earned a median income of $60,964, while females earned $43,807, leading to a 28% gender pay gap among full-time workers. This illustrates that women earn 72 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Rumford town.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Rumford town median household income by race. You can refer the same here

  9. LLMWorldOfWords/LWOW: First release

    • zenodo.org
    zip
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Elizabeth Abramski; Katherine Elizabeth Abramski; Riccardo Improta; Riccardo Improta; Giulio Rossetti; Giulio Rossetti; Massimo Stella; Massimo Stella (2025). LLMWorldOfWords/LWOW: First release [Dataset]. http://doi.org/10.5281/zenodo.15222294
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katherine Elizabeth Abramski; Katherine Elizabeth Abramski; Riccardo Improta; Riccardo Improta; Giulio Rossetti; Giulio Rossetti; Massimo Stella; Massimo Stella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 15, 2025
    Area covered
    Lviv
    Description

    The "LLM World of Words" (LWOW) [1] is a collection of datasets of English free association norms generated by various large language models (LLMs). Currently, the collection consists of datasets generated by Mistral, LLaMA3, and Claude Haiku. The datasets are modeled after the "Small World of Words" (SWOW) (https://smallworldofwords.org/en/project/) [2] English free association norms, generated by humans, consisting of over 12,000 cue words and over 3 million responses. The purpose of the LWOW datasets is to provide a way to investigate various aspects of the semantic memory of LLMs using an approach that has been applied extensively for investigating the semantic memory of humans. These datasets, together with the SWOW dataset, can be used to gain insights about similarities and differences in the language structures possessed by humans and LLMs.

    What are free associations?

    Free associations are implicit mental connections between words or concepts. They are typically accessed by presenting humans (or AI agents) with a cue word and then asking them to respond with the first words that come to mind. The responses represent implicit associations that connect different concepts in the mind, reflecting the semantic representations that underly patterns of thought, memory, and language. For example, given the cue word "woman", a common free association response might be "man", reflecting the associative mental relation between these two concepts.

    How can they be used?

    Free associations have been extensively used in cognitive psychology and linguistics as a tool for studying language and cognitive information processing. They provide a way for researchers to understand how conceptual knowledge is organized and accessed in the mind. Free associations are often used to built network models of semantic memory by connecting cue words to their responses. When thousands of cues and responses are connected in this way, the result is a complex network model that represents the complex organization of semantic knowledge. Such models enable the investigation of complex cognitive processes that take place within semantic memory, and can be used to study a variety of cognitive phenomena such as language learning, creativity, personality traits, and cognitive biases.

    Validation of the datasets with semantic priming

    The LWOW datasets were validated using data from the Semantic Priming Project (https://www.montana.edu/attmemlab/spp.html) [3], which implements a lexical decision task (LDT) to study semantic priming. The semantic priming effect is the cognitive phenomenon that a target word (e.g. nurse) is more easily recognized when it is prompted by a related prime word (e.g. doctor) compared to an unrelated prime word (e.g. doctrine). We simulated the semantic priming effect within network models of semantic memory built from both the LWOW and the SWOW free association norms by implementing spreading activation processes within the networks [4]. We found that the final activation levels of prime-target pairs correlated significantly with reaction time data for the same prime-target pairs from the LDT. Specifically, the activation of a target node (e.g. nurse) is higher when a related prime node (e.g. doctor) is activated compared to an unrelated prime node (e.g. doctrine). These results demonstrate how the LWOW datasets can be used for investigating cognitive and linguistic phenomena in LLMs, demonstrating the validity of the datasets.

    Investigating gender biases

    To demonstrate how this dataset can be used to investigate gender biases in LLMs compared to humans, we conducted an analysis using network models of semantic memory built from both the LWOW and the SWOW free association norms. We applied a methodology that simulates semantic priming within the networks to measure the strength of association between pairs of concepts, for example, "woman" and "forecful" vs. "man" and "forceful". We applied this methodology using a set of female-related and male-related primes, and a set of female-related and male-related targets. This analysis revealed that certain adjectives like "forceful" and "strong" are more strongly associated with certain genders, shedding light on the types of stereotypical gender biases that both humans and LLMs possess.

    Technical notes

    The free associations were generated (either via API or locally, depending on the LLM) by providing each LLM with a set of cue words and the following prompt: "You will be provided with an input word. Write the first 3 words you associate to it separated by a comma." This prompt was repeated 100 times for each cue word, resulting in a dataset of 11,545 unique cues words and 3,463,500 total responses for each LLM.

    How to access and use the datasets

    The LWOW datasets for Mistral, Llama3, and Haiku can be found in the LWOW_datasets folder, which contains two subfolders. The .csv files of the processed cues and responses can be found in the processed_datasets folder while the .csv files of the edge lists of the semantic networks constructed from the datasets can be found in the graphs/edge_lists folder.

    Since the LWOW datasets are intended to be used in comparison to humans, we have further processed the original SWOW dataset to create a Human dataset that is aligned with the processing that we applied to the LWOW datasets. While this human dataset is not included in this repository due to the license of the original SWOW dataset, it can be easily reproduced by running the code provided in the reproducibility folder. We highly encourage you to generate this dataset as it enabales a direct comparison between humans and LLMs. The Human dataset can be generated with the following steps:

    • Go to the SWOW research page (https://smallworldofwords.org/en/project/research) [2] and download the English processed data (SWOW-EN18). Save this .csv file with the name "SWOW-EN.R100.csv" in the reproducibility/data/original_datasets folder.
    • Run the python file FA_data_Cleaning.py saved in the reproducibility folder. This will generate a .csv of the processed Human dataset, which will be saved in the reproducibility/data/processed_datasets folder. Note that this python script will also regenerate the .csv files of the processed LWOW datasets (the same that can be found in the LWOW_datasets/processed_datasets folder).
    • Run the python file FA_build_Networks.py saved in the reproducibility folder. This will generate a .csv of the edge list of the semantic network constructed from the Human dataset, which will be saved in the reproducibility/data/graphs/edge_lists folder. Note that this python script will also regenerate the .csv files of the same edges lists of the LLM networks (the same that can be found in the LWOW_datasets/graphs/edge_lists folder). This python script will also produce igraph versions of all the semantic networks.

    How to reproduce the data and analyses

    To reproduce the analyses, first the required external files need to be downloaded:

    • Go to the SWOW research page (https://smallworldofwords.org/en/project/research) [2] and download the English data SWOW-EN18. Save this .csv file with the name "SWOW-EN.R100.csv" in the reproducibility/data/original_datasets folder.
    • Go to the Semantic Priming Project (https://www.montana.edu/attmemlab/spp.html) [3] and download the LDT Priming Data. Save this .csv file with the name "primingLDT_data.csv" in the reproducibility/data/LDT_analyses folder.

    Once the files are saved in the correct folders, follow the instructions in each script, which can be found in the reproducibility folder. The scripts should be run in the following order:

    1. FA_data_Generation.py: generates the raw LLM datasets
    2. FA_data_Cleaning.py: processes the original SWOW dataset and the raw LLM datasets
    3. FA_build_Networks.py: builds the semantic networks from the datasets
    4. FA_analyses_LDT_Gender.py and FA_spreadr.r: implements spreading activation processes within the networks in order to validate the datasets and investigate gender biases

    Do you want to know more? Read the Preprint!

    Abramski, K., et al. (2024). The "LLM World of Words" English free association norms generated by large language models (https://arxiv.org/abs/2412.01330)

    Funding & Legal

    • SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021;
    • EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
    • The HumaneAI-Net project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026.
    • COGNOSCO grant funded by Università di Trento (Grant ID: PS 22_27).

    For speaking requests and enquiries, please contact:

    • Katherine Abramski : katherine.abramski@phd.unipi.it
    • Giulio Rossetti : giulio.rossetti@isti.cnr.it
    • Massimo Stella : massimo.stella-1@unitn.it

    References

    [1] Abramski, K., et al. (2024). The" LLM World of Words" English free association norms generated

  10. Dataset from A Phase 4 Comparison of Duloxetine Dosing Strategies in the...

    • data.niaid.nih.gov
    Updated Feb 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Call 1-877-CTLILLY (1-877-285-4559) or 1-317-615-4559 Mon-Fri 9AM - 5PM Eastern time (UTC/GMT - 5hours, EST) (2025). Dataset from A Phase 4 Comparison of Duloxetine Dosing Strategies in the Treatment of Korean Patients With Major Depressive Disorder [Dataset]. http://doi.org/10.25934/00004464
    Explore at:
    Dataset updated
    Feb 22, 2025
    Dataset provided by
    Eli Lilly and Companyhttps://lilly.com/
    Authors
    Call 1-877-CTLILLY (1-877-285-4559) or 1-317-615-4559 Mon-Fri 9AM - 5PM Eastern time (UTC/GMT - 5hours, EST)
    Area covered
    Republic of, Korea
    Variables measured
    Adverse Event, Clinical Global Impression, Gastrointestinal Sensation - Finding, Hamilton Rating Scale For Depression, Patient Global Impression of Improvement
    Description

    The purpose of this study is to assess nausea severity in response to four different drug dosing strategies of Duloxetine (30 mg with food, 60 mg with food, 30 mg without food, and 60 mg without food) in Korean patients with major depressive disorder (MDD).

  11. N

    Sutherlin, OR annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Sutherlin, OR annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/sutherlin-or-income-by-gender/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sutherlin
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Sutherlin. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Sutherlin, the median income for all workers aged 15 years and older, regardless of work hours, was $42,190 for males and $25,678 for females.

    These income figures highlight a substantial gender-based income gap in Sutherlin. Women, regardless of work hours, earn 61 cents for each dollar earned by men. This significant gender pay gap, approximately 39%, underscores concerning gender-based income inequality in the city of Sutherlin.

    - Full-time workers, aged 15 years and older: In Sutherlin, among full-time, year-round workers aged 15 years and older, males earned a median income of $58,735, while females earned $41,581, leading to a 29% gender pay gap among full-time workers. This illustrates that women earn 71 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Sutherlin.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Sutherlin median household income by race. You can refer the same here

  12. N

    Income Distribution by Quintile: Mean Household Income in Great Neck Plaza,...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Great Neck Plaza, NY [Dataset]. https://www.neilsberg.com/research/datasets/949af105-7479-11ee-949f-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Great Neck Plaza, New York
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Great Neck Plaza, NY, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 15,044, while the mean income for the highest quintile (20% of households with the highest income) is 344,968. This indicates that the top earners earn 23 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 567,477, which is 164.50% higher compared to the highest quintile, and 3772.12% higher compared to the lowest quintile.

    Mean household income by quintiles in Great Neck Plaza, NY (in 2022 inflation-adjusted dollars))

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Great Neck Plaza median household income. You can refer the same here

  13. f

    Presentation of the statistic indicators of our model for the 187 images of...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Crisosto; Andreas Voskrebenzev; Marcel Gutberlet; Filip Klimeš; Till F. Kaireit; Gesa Pöhler; Tawfik Moher; Lea Behrendt; Robin Müller; Maximilian Zubke; Frank Wacker; Jens Vogel-Claussen (2023). Presentation of the statistic indicators of our model for the 187 images of the test dataset without consolidations and the 38 images of the test dataset with consolidations; the mean Sørensen-Dice similarity (SDC ± standard deviation), p-values distribution of the SD and the mean Hausdorff (HD) distance coefficient with the corresponding p-values distribution of the HD. [Dataset]. http://doi.org/10.1371/journal.pone.0285378.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Cristian Crisosto; Andreas Voskrebenzev; Marcel Gutberlet; Filip Klimeš; Till F. Kaireit; Gesa Pöhler; Tawfik Moher; Lea Behrendt; Robin Müller; Maximilian Zubke; Frank Wacker; Jens Vogel-Claussen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation of the statistic indicators of our model for the 187 images of the test dataset without consolidations and the 38 images of the test dataset with consolidations; the mean Sørensen-Dice similarity (SDC ± standard deviation), p-values distribution of the SD and the mean Hausdorff (HD) distance coefficient with the corresponding p-values distribution of the HD.

  14. Data from: Variable KOC and Poor-Quality Data Sources Cause High Discrepancy...

    • acs.figshare.com
    xlsx
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fu Liu; Fan Fan; Qingmiao Yu; Hongqiang Ren; Jinju Geng (2025). Variable KOC and Poor-Quality Data Sources Cause High Discrepancy in Current Mobility Assessment of Organic Substances [Dataset]. http://doi.org/10.1021/acsestwater.4c00731.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    ACS Publications
    Authors
    Fu Liu; Fan Fan; Qingmiao Yu; Hongqiang Ren; Jinju Geng
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The widespread distribution of persistent, mobile, and toxic organic chemicals (PMT) in aquatic environments poses a threat to water resources. Current mobility assessments rely on the organic carbon normalized adsorption coefficient (KOC), but it is sometimes highly variable with sorptive phase (soil/sediment) properties. There is a common oversight that this variability causes assessment discrepancies. Herein, this variability was quantitatively evaluated based on compiled experimental KOC data sets, which were obtained under OECD guidelines. The results show that both the average discrepancy rate and relative difference rate are nearly half of those of the substances among recent reports. The underlying reasons are high KOC variability and poor-quality assessment data sources which fail to capture this variability. The variation in KOC values for one-third of the charged organic compounds is more than 1 order of magnitude, around twice higher than that of neutral organic compounds. The KOC values from common integrated databases or available quantitative structure–property relationships all have almost orders of magnitude differences compared with data sets, especially for charged compounds. The insights presented here have significant value in the future development of a proper mobility assessment.

  15. N

    Swansea, IL annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Swansea, IL annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a53a69ef-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Swansea, Illinois
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Swansea. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Swansea, the median income for all workers aged 15 years and older, regardless of work hours, was $48,750 for males and $30,417 for females.

    These income figures highlight a substantial gender-based income gap in Swansea. Women, regardless of work hours, earn 62 cents for each dollar earned by men. This significant gender pay gap, approximately 38%, underscores concerning gender-based income inequality in the village of Swansea.

    - Full-time workers, aged 15 years and older: In Swansea, among full-time, year-round workers aged 15 years and older, males earned a median income of $76,189, while females earned $59,162, leading to a 22% gender pay gap among full-time workers. This illustrates that women earn 78 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Swansea.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Swansea median household income by race. You can refer the same here

  16. f

    Data from: S1 Dataset -

    • plos.figshare.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenwei Li; Shihai Zhang; Chongnian Qu; Zimiao Zhang; Feng Sun (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0304819.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Zhenwei Li; Shihai Zhang; Chongnian Qu; Zimiao Zhang; Feng Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar cells are playing a significant role in aerospace equipment. In view of the surface defect characteristics in the manufacturing process of solar cells, the common surface defects are divided into three categories, which include difficult-detecting defects (mismatch), general defects (bubble, glass-crack and cell-crack) and easy-detecting defects (glass-upside-down). Corresponding to different types of defects, the deep learning model with different optimization methods and a classification detection method based on multi-models fusion are proposed in the paper. In the proposed model, in order to solve the mismatch problem between the default anchor boxes size of YOLOv5s model and the extreme scale of the battery mismatch defect label boxes, the K-means algorithm was adopted to re-cluster the dedicated anchor boxes for the mismatch defect label boxes. In order to improve the comprehensive detection accuracy of YOLOv5s model for the general defects, the YOLOv5s model was also improved by the methods of image preprocessing, anchor box improving and detection head replacing. In order to ensure the recognition accuracy and improve the detection speed for easy-detecting defects, the lightweight classification network MobileNetV2 was also used to classify the cells with glass-upside-down defects. The experimental results show that the proposed optimization model and classification detection method can significantly improve the defect detection precision. Respectively, the detection precision for mismatch, bubble, glass-crack and cell-crack defects are up to 95.64%, 91.8%, 93.1% and 98.0%. By using lightweight model to train the glass-upside-down defect dataset, the average classification accuracy reaches 100% and the detection speed reaches 13.29 frames per second. The comparison experiments show that the proposed model has a great improvement in detection accuracy compared with the original model, and the defect detection speed of lightweight classification network is improved more obviously, which confirms the effectiveness of the proposed optimization model and the multi-defect classification detection method for solar cells defect detection.

  17. N

    Sinton, TX annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Sinton, TX annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a5372e97-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Texas, Sinton
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Sinton. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Sinton, the median income for all workers aged 15 years and older, regardless of work hours, was $33,381 for males and $17,755 for females.

    These income figures highlight a substantial gender-based income gap in Sinton. Women, regardless of work hours, earn 53 cents for each dollar earned by men. This significant gender pay gap, approximately 47%, underscores concerning gender-based income inequality in the city of Sinton.

    - Full-time workers, aged 15 years and older: In Sinton, among full-time, year-round workers aged 15 years and older, males earned a median income of $45,032, while females earned $34,583, leading to a 23% gender pay gap among full-time workers. This illustrates that women earn 77 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Sinton.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Sinton median household income by race. You can refer the same here

  18. f

    Comparison of classifier performance across two data sets.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant (2023). Comparison of classifier performance across two data sets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000093.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The top table shows the average classifier performance for cross-validation on the 9-locus public STR data. The bottom table is the performance for the same test, but on a 9-locus subset of our ground-truth training data. While overall performance is lower than the 15-locus cross-validation test on our ground-truth data (Table 1), the two data sets perform similarly here, indicating that increasing the number of markers in the data set can significantly improve performance.

  19. f

    Annotation times for the external testing dataset.

    • plos.figshare.com
    xls
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Mustonen; Antti Isosalo; Minna Nortunen; Mika Nevalainen; Miika T. Nieminen; Heikki Huhta (2024). Annotation times for the external testing dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0313126.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Henrik Mustonen; Antti Isosalo; Minna Nortunen; Mika Nevalainen; Miika T. Nieminen; Heikki Huhta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotation times for the external testing dataset.

  20. N

    Crossett, AR annual median income by work experience and sex dataset: Aged...

    • neilsberg.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Crossett, AR annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a50e0628-f4ce-11ef-8577-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Crossett, Arkansas
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Crossett. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

    Key observations: Insights from 2023

    Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Crossett, the median income for all workers aged 15 years and older, regardless of work hours, was $31,618 for males and $19,701 for females.

    These income figures highlight a substantial gender-based income gap in Crossett. Women, regardless of work hours, earn 62 cents for each dollar earned by men. This significant gender pay gap, approximately 38%, underscores concerning gender-based income inequality in the city of Crossett.

    - Full-time workers, aged 15 years and older: In Crossett, among full-time, year-round workers aged 15 years and older, males earned a median income of $54,148, while females earned $39,318, leading to a 27% gender pay gap among full-time workers. This illustrates that women earn 73 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.

    Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in Crossett.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

    Gender classifications include:

    • Male
    • Female

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Variables / Data Columns

    • Year: This column presents the data year. Expected values are 2010 to 2023
    • Male Total Income: Annual median income, for males regardless of work hours
    • Male FT Income: Annual median income, for males working full time, year-round
    • Male PT Income: Annual median income, for males working part time
    • Female Total Income: Annual median income, for females regardless of work hours
    • Female FT Income: Annual median income, for females working full time, year-round
    • Female PT Income: Annual median income, for females working part time

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Crossett median household income by race. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan (2023). Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed. [Dataset]. http://doi.org/10.1371/journal.pone.0187132.t002

Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Examples of studies that used presence-absence data to compute Jaccard’s similarity index (J) for determining similarity between systems (e.g., between taxa-pairs, between sites, between markets) where the statistical significance of J is faulty and the use of observed value of J as a similarity metric is flawed.

Search
Clear search
Close search
Google apps
Main menu