The Gender Statistics database is a comprehensive source for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
The data is split into several files, with the main one being Data.csv. The Data.csv contains all the variables of interest in this dataset, while the others are lists of references and general nation-by-nation information.
Data.csv contains the following fields:
I couldn't find any metadata for these, and I'm not qualified to guess at what each of the variables mean. I'll list the variables for each file, and if anyone has any suggestions (or, even better, actual knowledge/citations) as to what they mean, please leave a note in the comments and I'll add your info to the data description.
Country-Series.csv
Country.csv
FootNote.csv
Series-Time.csv
Series.csv
This dataset was downloaded from The World Bank's Open Data project. The summary of the Terms of Use of this data is as follows:
You are free to copy, distribute, adapt, display or include the data in other products for commercial and noncommercial purposes at no cost subject to certain limitations summarized below.
You must include attribution for the data you use in the manner indicated in the metadata included with the data.
You must not claim or imply that The World Bank endorses your use of the data by or use The World Bank’s logo(s) or trademark(s) in conjunction with such use.
Other parties may have ownership interests in some of the materials contained on The World Bank Web site. For example, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider, as well as information regarding how to contact the original content provider. Before incorporating any data in other products, please check the list: Terms of use: Restricted Data.
-- [ed. note: this last is not applicable to the Gender Statistics database]
The World Bank makes no warranties with respect to the data and you agree The World Bank shall not be liable to you in connection with your use of the data.
This is only a summary of the Terms of Use for Datasets Listed in The World Bank Data Catalogue. Please read the actual agreement that controls your use of the Datasets, which is available here: Terms of use for datasets. Also see World Bank Terms and Conditions.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
I would like to begin this work by offering a few introductory words. This is the first time I am writing this type of work, and I want to emphasize that I am open to any comments and suggestions regarding my work. I know that there is always room for improvement, and I would gladly take advantage of your advice to become better at what I do.
github with Dashboard and python file: https://github.com/Dzynekz/Poland-s-population-by-voivodeship-2002-2021-
Thank you in advance for your time and I wish you a pleasant reading.
The aim of the study is to approximate the trends and changes in selected demographic data describing the population of Poland from 2002 to 2021. The collected data allows for analysis, taking into account the administrative division into voivodeships, age groups and gender. The study focuses on answering the following research questions: 1. How has the population of Poland changed? 2. Does the introduction of the "500+" program in 2016 have a positive impact on increasing the number of births? 3. How have economic age groups changed over the years?
One of the key tools used during the acquisition of reliable data was the API of the Central Statistical Office, which allowed me to access a huge database containing, among other things, information about the population in Poland from 2002 to 2021. Through analysis of the open API documentation of the CSO and the use of provided methods, I selected the most interesting ranges of information about the population, divided by voivodeships, age groups, and gender. I downloaded the complete set of statistical data using self-developed Python code, which, based on defined parameters, automated the necessary API method calls, conversion, and saving of the received data in CSV format. Having the data in the selected format, I was able to easily and efficiently import, process, and analyze the collected information using chosen tools. Without access to the open API of the CSO and the ability to use it, collecting data on population changes over the years would have been much more difficult and time-consuming. Thanks to widely used API interfaces in today's times, we can effectively acquire, gather, and process valuable data that can be used for analysis, forecasting trends, creating long-term strategies, or making daily decisions in many aspects of our daily lives (economy, finance, economics, etc.).
Below I present a visualization that illustrates changes in the population of Poland over the years:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14257214%2Fb7b5b7b2d92cfc75b225df87a9fd004f%2FDashboard.png?generation=1681134675762237&alt=media" alt="">
Analyzing the data on the population of Poland from 2002 to 2021, we can see that it underwent interesting changes. From 2002 to 2006, the population slightly decreased and amounted to: 38.21 million, 38.18 million, 38.17 million, 38.15 million, and 38.13 million, respectively. Then, from 2007 to 2011, the population strongly increased, reaching a peak of 38.53 million in 2011. In the following years, the population began to slightly decrease until 2019, to the level of 38.38 million. The largest decrease in population was recorded in 2020-2021, reaching a level of 37.9 million people, most likely due to the COVID-19 pandemic. Overall, over the entire period under investigation, the population in Poland decreased by about 1.3%.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14257214%2F1a9ca78c8df280505efccfddb4d73cb5%2Fobraz_2023-04-10_155200671.png?generation=1681134723226009&alt=media" alt="">
The changes in the population of residents in individual voivodeships are very interesting. The largest increase in population was recorded in the Mazowieckie voivodeship and amounted to 380 thousand.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14257214%2F6de93256154094f2462b9f3c27bcba06%2Fobraz_2023-04-10_155258708.png?generation=1681134780752830&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14257214%2Fad4b988567ac7a915f2ed4461c5b9c82%2Fobraz_2023-04-10_155318168.png?generation=1681134799959003&alt=media" alt="">
The largest population growth was recorded in the Mazowieckie, Małopolskie, Wielkopolskie and Pomorskie voivodeships. At the same time, the trend in the Śląskie and Lubelskie voivodeships was the opposite, with the population decreasing.
Furthermore, the data shows that in the remaining voivodeships of Poland, the number of inhabitants decreased. The largest decrease was recorded in the Śląskie voivodeship, which amounted to 350,000, and the...
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
About
We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.
Distribution
Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB
-Total Videos: 47,547
-Identities Covered: 20,841
-Resolution: 60% 4k(1980), 33% fullHD(1080)
-Formats: MP4
-Full-length videos with visible mouth movements in every frame.
-Minimum face size of 400 pixels.
-Video durations range from 20 seconds to 5 minutes.
-Faces have not been cut out, full screen videos including backgrounds.
Usage
This dataset is ideal for a variety of applications:
Face Recognition & Verification: Training and benchmarking facial recognition models.
Action Recognition: Identifying human activities and behaviors.
Re-Identification (Re-ID): Tracking identities across different videos and environments.
Deepfake Detection: Developing methods to detect manipulated videos.
Generative AI: Training high-resolution video generation models.
Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.
Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.
Coverage
Explaining the scope and coverage of the dataset:
Geographic Coverage: Worldwide
Time Range: Time range and size of the videos have been noted in the CSV file.
Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.
Languages Covered (Videos):
English: 23,038 videos
Portuguese: 1,346 videos
Spanish: 677 videos
Norwegian: 1,266 videos
Swedish: 1,056 videos
Korean: 848 videos
Polish: 1,807 videos
Indonesian: 1,163 videos
French: 1,102 videos
German: 1,276 videos
Japanese: 1,433 videos
Dutch: 1,666 videos
Indian: 1,163 videos
Czech: 590 videos
Chinese: 685 videos
Italian: 975 videos
Philipeans: 920 videos
Bulgaria: 340 videos
Romanian: 1144 videos
Arabic: 1691 videos
Who Can Use It
List examples of intended users and their use cases:
Data Scientists: Training machine learning models for video-based AI applications.
Researchers: Studying human behavior, facial analysis, or video AI advancements.
Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.
Additional Notes
Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
The Female Genital Mutilation Enhanced Dataset (FGMED) is a repository for data collected by healthcare providers in England where FGM was identified or a procedure for FGM was undertaken. Data collected includes FGM type, age (at which FGM was undertaken and at latest attendance), country (of birth and where FGM was undertaken) and if the patient was advised of the health implications and illegalities of FGM. This publication provides a CSV (comma-separated values) file containing data from FGMED for the period January to March 2025. Information on the fields, geographies and definitions is provided in the CSV metadata. The publication also includes the FGM dashboard. This tool allows users to analyse and visualise the data.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Data for this publication are extracted each month as a snapshot in time from the GP Payments system (Open Exeter) maintained by NHS Digital. This release is an accurate snapshot as at 01 April 2020. GP Practice; Primary Care Network (PCN); Sustainability and transformation partnership (STP); Clinical Commissioning Group (CCG) and NHS England Commissioning Region level data are released in single year of age (SYOA) and 5-year age bands, both of which finish at 95+, split by gender. In addition, organisational mapping data is available to derive STP; PCN; CCG; and Commissioning Region associated with a GP practice and is updated each month to give relevant organisational mapping. A change to NHS geographies was implemented on 01 April 2020 and the publication will reflect this mapping. Quarterly publications in January, April, July and October will include Lower Layer Super Output Area (LSOA) populations and a spotlight report. Note: An error was identified with the consistency of the mapping CSV file compared to other Patients registered at a GP practice monthly publications. This error was regarding the order and names of some of the columns and was corrected on 27/10/2020. The mapping file subsequently has the suffix _v2 to reflect this update.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Dataset contains counts and measures for individuals from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 1.
The variables included in this dataset are for the census usually resident population count (unless otherwise stated). All data is for level 1 of the classification (unless otherwise stated).
The variables for part 1 of the dataset are:
Download lookup file for part 1 from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.
Footnotes
Te Whata
Under the Mana Ōrite Relationship Agreement, Te Kāhui Raraunga (TKR) will be publishing Māori descent and iwi affiliation data from the 2023 Census in partnership with Stats NZ. This will be available on Te Whata, a TKR platform.
Geographical boundaries
Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.
Subnational census usually resident population
The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.
Population counts
Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.
Caution using time series
Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).
Study participation time series
In the 2013 Census study participation was only collected for the census usually resident population count aged 15 years and over.
About the 2023 Census dataset
For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.
Data quality
The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.
Concept descriptions and quality ratings
Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.
Disability indicator
This data should not be used as an official measure of disability prevalence. Disability prevalence estimates are only available from the 2023 Household Disability Survey. Household Disability Survey 2023: Final content has more information about the survey.
Activity limitations are measured using the Washington Group Short Set (WGSS). The WGSS asks about six basic activities that a person might have difficulty with: seeing, hearing, walking or climbing stairs, remembering or concentrating, washing all over or dressing, and communicating. A person was classified as disabled in the 2023 Census if there was at least one of these activities that they had a lot of difficulty with or could not do at all.
Using data for good
Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.
Confidentiality
The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.
Measures
Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value during measures calculations. Averages and medians based on less than six units (e.g. individuals, dwellings, households, families, or extended families) are suppressed. This suppression threshold changes for other quantiles. Where the cells have been suppressed, a placeholder value has been used.
Percentages
To calculate percentages, divide the figure for the category of interest by the figure for 'Total stated' where this applies.
Symbol
-997 Not available
-999 Confidential
Inconsistencies in definitions
Please note that there may be differences in definitions between census classifications and those used for other data collections.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Data for this publication are extracted each month as a snapshot in time from the Primary Care Registration database within the PDS (Patient Demographic Service) system. This release is an accurate snapshot as at 1 July 2024. GP Practice; Primary Care Network (PCN); Sub Integrated Care Board Locations (SICBL); Integrated Care Board (ICB) and NHS England Commissioning Region level data are released in single year of age (SYOA) and 5-year age bands, both of which finish at 95+, split by gender. In addition, organisational mapping data is available to derive PCN; SICBL; ICB and Commissioning Region associated with a GP practice and is updated each month to give relevant organisational mapping. Quarterly publications in January, April, July and October will include Lower Layer Super Output Area (LSOA) populations. This publication includes LSOA files produced from the new data source PDS. The files are available with the LSOA 2021 census codes and LSOA 2011 census codes – these are labelled in the zip file titles and the 2011 csv files have been marked with a suffix. Due to the change in data source and processing, there are differences to previous files. The totals in the LSOA files may not match the totals in practice files – we’re looking into explanations for these differences however initial analysis shows there are small differences at practice level. Unfortunately, we experienced issues this month and used an alternative process to generate the files – we have aimed for consistency with previous outputs and apologise for any minor differences you may encounter. English and Welsh LSOAs are included and all other instances are categorised as “OTHER”.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Data for this publication are extracted each month as a snapshot in time from the Primary Care Registration database within the NHAIS (National Health Application and Infrastructure Services) system. This release is an accurate snapshot as at 1st August 2021. GP Practice; Primary Care Network (PCN); Sustainability and transformation partnership (STP); Clinical Commissioning Group (CCG) and NHS England Commissioning Region level data are released in single year of age (SYOA) and 5-year age bands, both of which finish at 95+, split by gender. In addition, organisational mapping data is available to derive STP; PCN; CCG and Commissioning Region associated with a GP practice and is updated each month to give relevant organisational mapping. Quarterly publications in January, April, July and October will include Lower Layer Super Output Area (LSOA) populations and a spotlight report. The outbreak of Coronavirus (COVID-19) has led to changes in the work of General Practices and subsequently the data within this publication. Until activity in this healthcare setting stabilises, we urge caution in drawing any conclusions from these data without consideration of the country's circumstances and would recommend that any uses of these data are accompanied by an appropriate caveat. Note: An issue was identified with the Single Year of Age (GP-practice females) CSV file and incorrectly excluded data. This issue was fixed and the file replaced on 13/08/2021. We apologise for any inconvenience.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data report contains the annotated figures, statistics and visualisations of the project "Die twitternde Zunft. Historikertage auf Twitter (2012-2018)" by Mareike König and Paul Ramisch. In addition, the methodological approach to corpus creation, data cleaning, coding, network and text analysis as well as the legal and ethical considerations of the project are described.
The datasheets contain the dehydrated and annotated tweet ids that were used for our study. With the Twitter API this can be used to hydrate and restore the whole corpus, apart from deleted tweets. There are two versions of the CSV file, one with clean id values, the other where the id values are prepended with an “x”. This prevents certain tools from using scientific notation for the ids and breaking them, with the R library rtweet function read_twitter_csv() this is automatically resolved on import.
The files contain the following data:
status_id: The Twitter status id of the tweet
corpus_user_id: A corpus specific id for each user within the corpus (not the Twitter user id)
hauptkategorie_1: Primary category
hauptkategorie_2: Primary category 2
Gender: Gender of the user
Nebenkategorie: Secondary category
Furthermore, the following boolean variables describe what sub corpus each tweet is in, the main corpus per year that contains of both data sources (TAGS and API) and the yearly sub corpora divided by their data source (TAGS: orig_, API: api_):
You can find the code on R on GitHub: https://github.com/dhiparis/historikertag-twitter.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Gender Statistics database is a comprehensive source for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
The data is split into several files, with the main one being Data.csv. The Data.csv contains all the variables of interest in this dataset, while the others are lists of references and general nation-by-nation information.
Data.csv contains the following fields:
I couldn't find any metadata for these, and I'm not qualified to guess at what each of the variables mean. I'll list the variables for each file, and if anyone has any suggestions (or, even better, actual knowledge/citations) as to what they mean, please leave a note in the comments and I'll add your info to the data description.
Country-Series.csv
Country.csv
FootNote.csv
Series-Time.csv
Series.csv
This dataset was downloaded from The World Bank's Open Data project. The summary of the Terms of Use of this data is as follows:
You are free to copy, distribute, adapt, display or include the data in other products for commercial and noncommercial purposes at no cost subject to certain limitations summarized below.
You must include attribution for the data you use in the manner indicated in the metadata included with the data.
You must not claim or imply that The World Bank endorses your use of the data by or use The World Bank’s logo(s) or trademark(s) in conjunction with such use.
Other parties may have ownership interests in some of the materials contained on The World Bank Web site. For example, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider, as well as information regarding how to contact the original content provider. Before incorporating any data in other products, please check the list: Terms of use: Restricted Data.
-- [ed. note: this last is not applicable to the Gender Statistics database]
The World Bank makes no warranties with respect to the data and you agree The World Bank shall not be liable to you in connection with your use of the data.
This is only a summary of the Terms of Use for Datasets Listed in The World Bank Data Catalogue. Please read the actual agreement that controls your use of the Datasets, which is available here: Terms of use for datasets. Also see World Bank Terms and Conditions.