Facebook
TwitterThis dataset was created by Xiaochun Xu
Released under Data files © Original Authors
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Many by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Many. The dataset can be utilized to understand the population distribution of Many by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Many. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Many.
Key observations
Largest age group (population): Male # 10-14 years (130) | Female # 35-39 years (142). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Many Population by Gender. You can refer the same here
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This publication includes analysis of data for the months January 2024 to March 2024 from the Female Genital Mutilation (FGM) Enhanced Dataset (SCCI 2026) which is a repository for individual level data collected by healthcare providers in England, including acute hospital providers, mental health providers and GP practices. The report includes data on the type of FGM, age at which FGM was undertaken and in which country, the age of the woman or girl at her latest attendance and if she was advised of the health implications and illegalities of FGM and various other analyses. Some data for earlier years are reported.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Many by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Many across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of female population, with 54.68% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Many Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Population: Female: Aged 15-64 data was reported at 106,545,028.000 Person in 2017. This records an increase from the previous number of 106,254,414.000 Person for 2016. United States US: Population: Female: Aged 15-64 data is updated yearly, averaging 81,112,897.000 Person from Dec 1960 (Median) to 2017, with 58 observations. The data reached an all-time high of 106,545,028.000 Person in 2017 and a record low of 54,897,168.000 Person in 1960. United States US: Population: Female: Aged 15-64 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Population and Urbanization Statistics. Female population between the ages 15 to 64. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.; ; World Bank staff estimates using the World Bank's total population and age/sex distributions of the United Nations Population Division's World Population Prospects: 2017 Revision.; Sum; Relevance to gender indicator: Knowing how many girls, adolescents and women there are in a population helps a country in determining its provision of services.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Actual value and historical data chart for World Population Female Percent Of Total
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This comprehensive indicator offers detailed insights into the average hourly earnings derived from paid employment across various dimensions, including sex, occupation, age, and disability status. By examining the interplay of these factors, the indicator provides a nuanced understanding of wage differentials within the workforce. This information is invaluable for assessing patterns of income inequality, identifying potential areas for policy intervention, and fostering a more inclusive and equitable employment environment. Through its multifaceted approach, the indicator enables a thorough analysis of how various demographic variables intersect with earnings, thereby contributing to a more holistic comprehension of labor market dynamics and the socioeconomic landscape.
PS I hope this dataset will answer many of your questions and will be trigger to many new ones. I will read every comment and notebooks as I do it every time and hope to see your mind blowing conclusions. Good luck and thank you for being here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Population: as % of Total: Female: Aged 65 and Above data was reported at 16.925 % in 2017. This records an increase from the previous number of 16.550 % for 2016. United States US: Population: as % of Total: Female: Aged 65 and Above data is updated yearly, averaging 14.035 % from Dec 1960 (Median) to 2017, with 58 observations. The data reached an all-time high of 16.925 % in 2017 and a record low of 10.023 % in 1960. United States US: Population: as % of Total: Female: Aged 65 and Above data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Population and Urbanization Statistics. Female population 65 years of age or older as a percentage of the total female population. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.; ; World Bank staff estimates based on age/sex distributions of United Nations Population Division's World Population Prospects: 2017 Revision.; Weighted average; Relevance to gender indicator: Knowing how many girls, adolescents and women there are in a population helps a country in determining its provision of services.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Quick reproducibility & validation (PowerShell) ```powershell
Test-Path .\corpus\audit_corpus_gender_bias.csv Get-Content .\corpus\audit_corpus_gender_bias.csv | Measure-Object -Line
python -m venv .venv .venv\Scripts\Activate.ps1 pip install pandas tqdm ```
Quick start: load and basic stats (Python) ```python import pandas as pd df = pd.read_csv("corpus/audit_corpus_gender_bias.csv")
print(df['name_category'].value_counts())
print(df.sample(5)['full_prompt_text'].to_list()) ```
Recommended evaluation workflow (high level) 1. Use this CSV to generate model responses for each prompt (consistent model settings). 2. Clean & parse outputs into numeric/label format as appropriate (use structured prompting where possible). 3. Aggregate responses grouped by name_category (Male vs Female) while holding profession/trait/template constant. 4. Compute descriptive stats per group (mean, median, sd) and per stratum (profession × trait_category). 5. Run statistical tests and effect-size estimates: - Permutation test or Mann-Whitney U (non-parametric) - Bootstrap confidence intervals for medians/means - Cohen’s d or Cliff’s delta for effect size 6. Correct for multiple comparisons (Benjamini–Hochberg) when testing many strata. 7. Visualise with violin + boxplots and difference plots with CIs.
Suggested quantitative metrics - Mean/median differences (Male − Female) - Bootstrap 95% CI on difference - Cohen’s d or Cliff’s delta - p-values from permutation test / Mann-Whitney U - Proportion of model outputs that deviate from the expected neutral baseline (for categorical outputs)
Suggested visualizations - Grouped violin plots (by profession) split by name_category - Difference-in-means bar with bootstrap CI per profession - Heatmap of effect sizes (profession × trait_category) - Distribution overlay of raw responses
Recommended analysis notebooks/kernels to provide on Kaggle - 01_data_load_and_summary.ipynb — load CSV, sanity checks, counts - 02_model_response_collection.ipynb — how to call a model endpoint safely (placeholders) - 03_cleaning_and_parsing.ipynb — parsing rules and robustness tests - 04_statistical_tests.ipynb — permutation tests, bootstrap CI, effect sizes - 05_visualizations.ipynb — plots and interpretation
Security & best practices - Never commit API keys in notebooks. Use environment variables and secrets built into Kaggle. - Keep model call rate-limited and log failures; use retry/backoff. - Use fixed random seeds for reproducibility where sampling occurs.
Limitations & caveats (must show on dataset page) - Cultural and name recognition: names may suggest different demographics across regions; results are context-sensitive. - Only Male vs Female: dataset intentionally isolates binary gender categories; extend carefully for broader demographic categories. - Controlled prompts reduce ecological validity — real interactions may be longer and noisier. - Parsing risk: models sometimes add explanatory text; structured prompting or requesting a JSON response is recommended.
How this dataset differs from academic prototypes - This corpus is deterministic and template-driven to ensure strict control over confounds (only the name varies). Use it when you require reproducibility and controlled comparisons rather than open-ended, real-world prompts.
Suggested Kaggle tags and categor...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Actual value and historical data chart for World Population Female
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Actual value and historical data chart for United States Population Female Percent Of Total
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Population: as % of Total: Female: Aged 15-64 data was reported at 64.768 % in 2017. This records a decrease from the previous number of 65.038 % for 2016. United States US: Population: as % of Total: Female: Aged 15-64 data is updated yearly, averaging 64.683 % from Dec 1960 (Median) to 2017, with 58 observations. The data reached an all-time high of 66.046 % in 2009 and a record low of 59.938 % in 1962. United States US: Population: as % of Total: Female: Aged 15-64 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Population and Urbanization Statistics. Female population between the ages 15 to 64 as a percentage of the total female population. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.; ; World Bank staff estimates based on age/sex distributions of United Nations Population Division's World Population Prospects: 2017 Revision.; Weighted average; Relevance to gender indicator: Knowing how many girls, adolescents and women there are in a population helps a country in determining its provision of services.
Facebook
TwitterMany beautiful pictures of people: men, women, girls, boys, and cute babies, performing different activities in various walks of life.
JPG version of images are located in images folder and description of images is image info.xlsx file. image info.xlsx file contains information about id, image name and, caption of the image.
Saber MalekzadeH ( @sabermalek ) is my supervisor and I got the idea from him to collect such images for Image Captioning. data scrapped from https://freerangestock.com/gallery.php?gid=42&page_num=1&orderby=code cover image from https://ux.shopify.com/you-cant-just-draw-purple-people-and-call-it-diversity-e2aa30f0c0e8
This image dataset can be used for Image Captioning to generate textual description of an image.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Peru PE: Population: as % of Total: Female: Aged 65 and Above data was reported at 7.805 % in 2017. This records an increase from the previous number of 7.612 % for 2016. Peru PE: Population: as % of Total: Female: Aged 65 and Above data is updated yearly, averaging 4.254 % from Dec 1960 (Median) to 2017, with 58 observations. The data reached an all-time high of 7.805 % in 2017 and a record low of 3.721 % in 1960. Peru PE: Population: as % of Total: Female: Aged 65 and Above data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Peru – Table PE.World Bank: Population and Urbanization Statistics. Female population 65 years of age or older as a percentage of the total female population. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.; ; World Bank staff estimates based on age/sex distributions of United Nations Population Division's World Population Prospects: 2017 Revision.; Weighted average; Relevance to gender indicator: Knowing how many girls, adolescents and women there are in a population helps a country in determining its provision of services.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vietnam VN: Population: Female: Aged 15-64 data was reported at 33,496,592.000 Person in 2017. This records an increase from the previous number of 33,258,832.000 Person for 2016. Vietnam VN: Population: Female: Aged 15-64 data is updated yearly, averaging 18,965,869.000 Person from Dec 1960 (Median) to 2017, with 58 observations. The data reached an all-time high of 33,496,592.000 Person in 2017 and a record low of 9,167,274.000 Person in 1960. Vietnam VN: Population: Female: Aged 15-64 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Vietnam – Table VN.World Bank.WDI: Population and Urbanization Statistics. Female population between the ages 15 to 64. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.; ; World Bank staff estimates using the World Bank's total population and age/sex distributions of the United Nations Population Division's World Population Prospects: 2017 Revision.; Sum; Relevance to gender indicator: Knowing how many girls, adolescents and women there are in a population helps a country in determining its provision of services.
Facebook
TwitterThis is a dataset built from a Wikipedia table of ~500 female queer characters in television. Duplicate shows were removed to create a list of ~250 shows that featured queer women. I added in columns discussing number of episodes, how many female queer character were featured, how many of those characters died, genre, and show run time to do an analysis of the Bury Your Gays trope on TV. This data was collected by hand, but it could likely be easily automated drawing from a few data sources.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jeffrey Mvutu Mabilama [source]
Welcome to an exciting exploration of global C2C fashion store user behaviour! This dataset seeks to serve as a benchmark by providing valuable insights into e-commerce users, enabling you to make informed decisions and effectively grow your business. Let's dive right into the data!
This dataset contains records on over 9 million registered users from a successful online C2C fashion store launched in Europe around 2009 and later expanded worldwide. It includes metrics such as country, gender, active users, top buyers/sellers/ratio*, products bought/sold/listed* and social network features (likes/follows). Furthermore this is just a preview of much larger data set which contains more detailed information including product listings, comments from listed products etc.
E-commerce has become an essential part of our lives - people are now accustomed to buying anything with a few clicks online. With so many unknown elements that come with not only selling but also providing good customer service - understanding user behavior is key for success in this domain. By utilizing this dataset you can answer questions such as 'how many customers are likely to drop off after years of using my service?,' 'are my users active enough compared to those in this dataset?,” or “how likely are people from other countries signing up in a C2C website?' In addition, if you think this kind odf dataset may be useful don't forget do show your support or appreciation by leaving an upvote or comment on the page!
My Telegram bot will answer any queries regarding the datasets as well allow you see contact me directly if necessary; also please don't forget check out the *[data.world page](https://data.world/jfreex/e-commerce-users-of-a-french-c2c
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a useful overview of global users' behavior in an online C2C fashion store. The data includes metrics such as buyers, top buyers, top buyer ratio, female buyers and their respective ratios, etc., per country. This dataset can be used to gain insights into how global audiences interact with the store and draw conclusions from comparison between different countries.
In order to make use of this dataset, one must first familiarize themselves with the various metrics included in it. These include: country; number of overall buyers; number of top buyers; ratio(s) of them (top buyer to total buyer); female-related data (buyers, top female buyers); bought-to-wish/like ration (top and non-top separately); overall products bought/wished/liked; total products sold by tops sellers in the same country versus what they sold outside the country; mean value for product stats (sold/listed/etc...) from looking at the whole population or just users that make those actions multiple times; average days for user offline /lurking around on the site without posting anything or buying anything etc.; mean follower(s) count(s).
Using this data one could generate reports about user behavior within particular countries either manually by computing all statistics or by using libraries like Pandas or SQL with queries made toward this datasets which consists of columns representing individual countries with all values necessary to answer any questions you might have regarding how many people buy something out there per region and what type they are –– Are they Top Buyer? Female? Etc.
Further potential work could involve utilising machine learning tools such as clustering algorithms to group similar customers together based on certain traits like age group, profession etc., so that personalised marketing promotions can be targetted at these customer clusters rather than aiming more generic ads at everyone!
Finally combined with other related product datasets which is available upon request via JfreexDatasets_bot provided by Jfreex team , this dataset can become another powerful tool providing you actionable insights into customers today — allowing you build better strategies towards improving customer experience tomorrow!
- Analyzing the conversion rate of users on a website - Comparing user metrics like the overall number of buyers, female buyers, top buyers ratio and top buyer gender can help determine if users in certain countries are more or less likely to convert into customers. Additionally, comparing average metrics like products bought or offl...
Facebook
TwitterAbstract: Mate choice is a key driver of evolutionary phenomena such as sexual dimorphism. Social mate choice is studied less often than reproductive mate choice, but for species that exhibit biparental care, choice of a social mate may have important implications for offspring survival and success. Many species make pairing decisions based on size that can lead to population-scale pairing patterns such as assortative and disassortative mating by size. Other size-based pairing patterns, such as females pairing with males larger than themselves, have been commonly studied in humans, but less often studied in nonhuman animal systems. Here we show that sexually size-dimorphic mountain chickadees, Poecile gambeli, appear to exhibit multiple self-referential pairing patterns when choosing a social mate. Females paired with males that were larger than themselves more often than expected by chance, and they paired with males that were slightly larger than themselves more often than they paired with males that were much larger than themselves. Preference for slightly larger males versus much larger males did not appear to be driven by reproductive benefits as there were no statistically significant differences in reproductive performance between pairs in which males were slightly larger and pairs in which males were much larger than females. Our results indicate that self-referential pairing beyond positive and negative assortment may be common in nonhuman animal systems.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains information about high school students and their actual and predicted performance on an exam. Most of the information, including some general information about high school students and their grade for an exam, was based on an already existing dataset, while the predicted exam performance was based on a human experiment. In this experiment, participants were shown short descriptions of the students (based on the information in the original data) and had to rank and grade according to their expected performance. Prior to this task some participants were exposed to some "Stereotype Activation", suggesting that boys perform less well in school than girls.
Based on this dataset (which is also available on kaggle), we extracted a number of student profiles that participants had to make grade predictions for. For more information about this dataset we refer to the corresponding kaggle page: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption
Note that we performed some preprocessing on the original data:
The original data consisted of two parts: the information about students following a Maths course and the information about students following a Portuguese course. Since in both datasets the same type of information was recorded, we merged both datasets and added a column "subject", to show which course each student belongs to
We excluded all data where G3 = 0 (i.e. the grade for the last exam = 0)
From original_data.csv we randomly sampled 856 students that participants in our study had to make grade predictions for.
index - this column corresponds to the indeces in the file "original_data.csv". Through these indices, it is possible to add columns from the original data to the dataset with the grade prediction
ParticipantID - the ID of the participant who made the performance predictions for the corresponding student. Predictions needed to be made for 856 students, and each participant made 8 predictions total. Thus there are 107 different participant IDs
name - to make the prediction task more engaging for participants, each of the 8 student profiles, that participants had to grade & rank was randomly matched to one of four boy/girl's names (depending on the sex of the student)
sex - the sex of each student, either female (F) or male (M). For benchmarking fair ML algorithms, this can be used as the sensitive attribute. We assume that in the fair version of the decision variable ("Pass"), no sex discrimination occurs. The biased versions of the variable ("Predicted Pass") are mostly discriminatory towards male students.
studytime - this variable is taken from the original dataset and denotes how long a student studied for their exam. In the original data this variable consisted of four levels (less than 2 hours vs. 2-5 hours vs. 5-10 hours vs. more than 10 hours). We binned the latter two levels together and encoded this column numerically from 1-3.
freetime - Originally, this variable ranged from 1 (very low) to 5 (very high). We binned this variable into three categories, where level 1 and 2 are binned, as well as level 4 and 5.
romantic - Binary variable, denoting whether the student is in a romantic relationship or not.
Walc - This variable shows how much alcohol each student consumes in the weekend. Originally it ranged from 1 to 5 (5 corresponding to the highest alcohol consumption), but we binned the last two levels together.
goout - This variable shows how often a student goes out in a week. Originally it ranged from 1 to 5 (5 corresponding to going out very often), but we binned the last two levels together.
Parents_edu - This variable was not present in the original dataset. Instead, the original dataset consisted of two variables "mum_edu" and "dad_edu". We obtained "Parents_edu" by taking the higher one of both. The variable consist of 4 levels, whereas 4 = highest level of education.
absences - This variable shows the number of absences per student. Originally it ranged from 0 - 93, but because large number of absences were infrequent we binned all absences of >=7 into one level.
reason - The reason for why a student chose to go to the school in question. The levels are close to home, school's reputation, school's curricular and other
G3 - The actual grade each student received for the final exam of the course, ranging from 0-20.
Pass - A binary variable showing whether G3 is a passing grade (i.e. >=10) or not.
Predicted Grade - The grade the student was predicted to receive in our experiment
Predicted Rank - In our ex...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population, female (% of total population) in Bangladesh was reported at 50.83 % in 2024, according to the World Bank collection of development indicators, compiled from officially recognized sources. Bangladesh - Population, female (% of total) - actual values, historical data, forecasts and projections were sourced from the World Bank on November of 2025.
Facebook
TwitterThis dataset was created by Xiaochun Xu
Released under Data files © Original Authors