100+ datasets found
  1. N

    Java, SD Population Breakdown by Gender Dataset: Male and Female Population...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Java, SD Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b23b79ff-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Dakota, Java
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Java by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Java across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a considerable majority of female population, with 65.66% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Java is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Java total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Java Population by Race & Ethnicity. You can refer the same here

  2. N

    Java, New York Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Java, New York Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1e8e49e-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York, Java
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Java town by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Java town. The dataset can be utilized to understand the population distribution of Java town by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Java town. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Java town.

    Key observations

    Largest age group (population): Male # 40-44 years (139) | Female # 65-69 years (126). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Java town population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Java town is shown in the following column.
    • Population (Female): The female population in the Java town is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Java town for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Java town Population by Gender. You can refer the same here

  3. h

    rlvr-code-data-java-edited

    • huggingface.co
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finbarr Timbers (2025). rlvr-code-data-java-edited [Dataset]. https://huggingface.co/datasets/finbarr/rlvr-code-data-java-edited
    Explore at:
    Dataset updated
    Aug 23, 2025
    Authors
    Finbarr Timbers
    Description

    finbarr/rlvr-code-data-java-edited dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    Data from: java-dataset

    • huggingface.co
    Updated May 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thieu Luu (2024). java-dataset [Dataset]. https://huggingface.co/datasets/echodrift/java-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2024
    Authors
    Thieu Luu
    Description

    echodrift/java-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Data from: DataTD: A Dataset of Java Projects Including Test Doubles

    • zenodo.org
    zip
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mengzhen Li; Mattia Fazzini; Mengzhen Li; Mattia Fazzini (2024). DataTD: A Dataset of Java Projects Including Test Doubles [Dataset]. http://doi.org/10.5281/zenodo.14271508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mengzhen Li; Mattia Fazzini; Mengzhen Li; Mattia Fazzini
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Description

    This dataset contains 1,070 open-source Java projects including test doubles. The projects were mined from GitHub. The starting point for building the dataset included all projects whose main language is Java and that had at least five stars as of October 29, 2023. This set of projects is listed in java_repositories_with_five_stars.txt. The 1,070 projects comprising this dataset use Maven as their build system, containing JUnit tests, and use Mockito to create test doubles. The projects are available in the project.zip archive file. The dataset also contains metadata about the projects, which is available in the projects.json file. The metadata describes the characteristics of each project together with the test double definitions, stubbings, and verifications inside the project. Finally, we also make available the source code used to build DataTD for future research on using and extending the dataset.

  6. h

    rlvr-code-data-Java

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shah, rlvr-code-data-Java [Dataset]. https://huggingface.co/datasets/saurabh5/rlvr-code-data-Java
    Explore at:
    Authors
    Saurabh Shah
    Description

    saurabh5/rlvr-code-data-Java dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. Data from: CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java...

    • zenodo.org
    application/gzip, bin
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaihang Jiang; Jin Bihui; Nie Pengyu; Kaihang Jiang; Jin Bihui; Nie Pengyu (2025). CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories [Dataset]. http://doi.org/10.5281/zenodo.15293313
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kaihang Jiang; Jin Bihui; Nie Pengyu; Kaihang Jiang; Jin Bihui; Nie Pengyu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern programming languages are constantly evolving, introducing new language features and APIs to enhance software development practices. Software developers often face the tedious task of upgrading their codebase to new programming language versions. Recently, large language models (LLMs) have demonstrated potential in automating various code generation and editing tasks, suggesting their applicability in automating code upgrade. However, there exists no benchmark for evaluating the code upgrade ability of LLMs, as distilling code changes related to programming language evolution from real-world software repositories’ commit histories is a complex challenge.
    In this work, we introduce CoUpJava, the first large-scale dataset for code upgrade, focusing on the code changes related to the evolution of Java. CoUpJava comprises 10,697 code upgrade samples, distilled from the commit histories of 1,379 open-source Java repositories and covering Java versions 7–23. The dataset is divided into two subsets: CoUpJava-Fine, which captures fine-grained method-level refactorings towards new language features; and CoUpJava-Coarse, which includes coarse-grained repository-level changes encompassing new language features, standard library APIs, and build configurations. Our proposed dataset provides high-quality samples by filtering irrelevant and noisy changes and verifying the compilability of upgraded code. Moreover, CoUpJava reveals diversity in code upgrade scenarios, ranging from small, fine-grained refactorings to large-scale repository modifications.

  8. I

    Indonesia No of Student: Higher Education: East Java

    • ceicdata.com
    Updated May 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2018). Indonesia No of Student: Higher Education: East Java [Dataset]. https://www.ceicdata.com/en/indonesia/number-of-student-by-province/no-of-student-higher-education-east-java
    Explore at:
    Dataset updated
    May 15, 2018
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2007 - Mar 1, 2018
    Area covered
    Indonesia
    Variables measured
    Education Statistics
    Description

    Indonesia Number of Student: Higher Education: East Java data was reported at 822,635.000 Person in 2018. This records a decrease from the previous number of 844,675.000 Person for 2017. Indonesia Number of Student: Higher Education: East Java data is updated yearly, averaging 446,119.500 Person from Mar 1995 (Median) to 2018, with 24 observations. The data reached an all-time high of 844,675.000 Person in 2017 and a record low of 329,178.000 Person in 1996. Indonesia Number of Student: Higher Education: East Java data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Global Database’s Indonesia – Table ID.GAC003: Number of Student: by Province.

  9. N

    Java, SD Population Breakdown by Gender Dataset: Male and Female Population...

    • neilsberg.com
    csv, json
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Java, SD Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/d078fbaa-c980-11ee-9145-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Dakota, Java
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Java by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Java across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of female population, with 60.54% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Java is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Java total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Java Population by Race & Ethnicity. You can refer the same here

  10. I

    Indonesia BPS Projection: Population: Mid-Year: East Java: Pacitan Regency

    • ceicdata.com
    Updated May 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2018). Indonesia BPS Projection: Population: Mid-Year: East Java: Pacitan Regency [Dataset]. https://www.ceicdata.com/en/indonesia/population-projection-midyear-east-java-by-regency-and-municipality-central-bureau-of-statistics/bps-projection-population-midyear-east-java-pacitan-regency
    Explore at:
    Dataset updated
    May 15, 2018
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2009 - Jun 1, 2020
    Area covered
    Indonesia
    Variables measured
    Population
    Description

    Indonesia BPS Projection: Population: Mid-Year: East Java: Pacitan Regency data was reported at 555.984 Person th in 2020. This records an increase from the previous number of 555.304 Person th for 2019. Indonesia BPS Projection: Population: Mid-Year: East Java: Pacitan Regency data is updated yearly, averaging 552.307 Person th from Jun 2008 (Median) to 2020, with 13 observations. The data reached an all-time high of 558.644 Person th in 2009 and a record low of 540.516 Person th in 2010. Indonesia BPS Projection: Population: Mid-Year: East Java: Pacitan Regency data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Indonesia Premium Database’s Socio and Demographic – Table ID.GAB016: Population Projection: Mid-Year: East Java: by Regency and Municipality: Central Bureau of Statistics.

  11. raw data (Java source code) for MSR data track 2019

    • figshare.com
    zip
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Capiluppi (2019). raw data (Java source code) for MSR data track 2019 [Dataset]. http://doi.org/10.6084/m9.figshare.7673264.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 5, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andrea Capiluppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the raw java classes of the projects parsed off SourceForge. The projects are categorised by application domain

  12. I

    Indonesia BPS Projection: Population: Java

    • ceicdata.com
    Updated May 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2018). Indonesia BPS Projection: Population: Java [Dataset]. https://www.ceicdata.com/en/indonesia/population-projection-by-province-central-bureau-of-statistics/bps-projection-population-java
    Explore at:
    Dataset updated
    May 15, 2018
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2024 - Dec 1, 2035
    Area covered
    Indonesia
    Variables measured
    Population
    Description

    Indonesia BPS Projection: Population: Java data was reported at 167,325.600 Person th in 2035. This records an increase from the previous number of 166,731.300 Person th for 2034. Indonesia BPS Projection: Population: Java data is updated yearly, averaging 132,192.050 Person th from Dec 1980 (Median) to 2035, with 56 observations. The data reached an all-time high of 167,325.600 Person th in 2035 and a record low of 91,609.500 Person th in 1980. Indonesia BPS Projection: Population: Java data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Global Database’s Indonesia – Table ID.GAA002: Population Projection: by Province: Central Bureau of Statistics.

  13. NOAA/WDS Paleoclimatology - East Java, Indonesia 1,200 Year Lake Sediment...

    • catalog.data.gov
    Updated Oct 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Centers for Environmental Information (Point of Contact); NOAA World Data Service for Paleoclimatology (Point of Contact) (2023). NOAA/WDS Paleoclimatology - East Java, Indonesia 1,200 Year Lake Sediment Geochemical Data [Dataset]. https://catalog.data.gov/dataset/noaa-wds-paleoclimatology-east-java-indonesia-1200-year-lake-sediment-geochemical-data
    Explore at:
    Dataset updated
    Oct 1, 2023
    Dataset provided by
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Area covered
    East Java, Indonesia
    Description

    This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Lake. The data include parameters of paleolimnology with a geographic location of Jawa Timur, Indonesia. The time period coverage is from 1108 to -61 in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.

  14. Code smells and quality attributes dataset

    • figshare.com
    zip
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ehsan Esmaili; Morteza Zakeri; Saeed Parsa (2024). Code smells and quality attributes dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24057336.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 3, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ehsan Esmaili; Morteza Zakeri; Saeed Parsa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1 Code smell datasetIn order to create a high quality code smell datasets, we merged five different datasets. These datasets are among the largest and most accurate in our paper “Predicting Code Quality Attributes Based on Code Smells ”. Various software projects were analyzed automatically and manually to collect these labels. Table 1 shows the dataset details.Table 1. Merged datasets and their characteristics.DatasetSamplesProjectsCode smellsPalomba (2018) [1]40888395 versions of 30 open-source projectsLarge class, complex class, class data should be private, inappropriate intimacy, lazy class, middle man, refused equest, spaghetti code, speculative generality, comments, long method, long parameter list, feature envy, message chainsMadeyski [2]3291523 open-source and industrial projectsBlob, data classKhomh [3]_54 versions of 4 open-source projectsAnti-singleton, swiss army knifePecorelli [4]3419 open-source projectsBlobPalomba (2017) [5]_6 open-source projectsDispersed coupling, shotgun surgeryCode smell datasets have been prepared at two levels: class and method. The class level is 15 different smells as labels and 81 software metrics as features. As well, there are five smells and 31 metrics on the method level. This dataset contains samples of Java classes and methods. A sample can be identified by its longname, which contains the project-name, package-name, JavaFile-name, class-name, and method-name. The quantity of each smell ranges from 40 to 11000. The total number of samples is 37517, while the number of non-smells is nearly 3 million. As a result, our dataset is the largest in the study. You can see the details in Table 2.Table 2. The number of smells and non-smells at class and method levelsLevelMetricsSmellSamplesTotalClass81Complex class126523438Class data should be private1839Inappropriate intimacy780Large class990Lazy class774Middle man193Refused bequest1985Spaghetti code3203Speculative generality2723Blob988Data class938Anti-singleton2993Swiss army knife4601Dispersed coupling41Shotgun surgery125Non-smell40506 [3] +8334 [5] +296854 [1]+43862 [2] +55214 [4]444770Method31Comments10714079Feature envy525Long method11366Long parameter list1983Message chains98Non-smell246917624691762 Quality datasetThis dataset contains over 1000 Java project instances where for each instance the relative frequency of 20 code smells has been extracted along with the value of eight software quality attributes. The code quality dataset contains 20 smells as features and 8 quality attributes as labels: Coverageability, extendability, effectiveness, flexibility, functionality, reusability, testability, and understandability. The samples are Java projects identified by their name and version. Features are the ratio of smelly and non-smelly classes or methods in a software project. The quality attributes are a normalized score calculated by QMOOD metrics [6] and models extracted by [7], [8]. 1014 samples of small and large open-source and industrial projects are included in this dataset.The data samples are used to train machine learning models predicting software quality attributes based on code smells.References[1] F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R. Oliveto, and A. De Lucia, “A large-scale empirical study on the lifecycle of code smell co-occurrences,” Inf Softw Technol, vol. 99, pp. 1–10, Jul. 2018, doi: 10.1016/J.INFSOF.2018.02.004.[2] L. Madeyski and T. Lewowski, “MLCQ: Industry-Relevant Code Smell Data Set,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Apr. 2020, pp. 342–347. doi: 10.1145/3383219.3383264.[3] F. Khomh, M. Di Penta, Y. G. Guéhéneuc, and G. Antoniol, “An exploratory study of the impact of antipatterns on class change- and fault-proneness,” Empir Softw Eng, vol. 17, no. 3, pp. 243–275, Jun. 2012, doi: 10.1007/s10664-011-9171-y.[4] F. Pecorelli, F. Palomba, F. Khomh, and A. De Lucia, “Developer-Driven Code Smell Prioritization,” Proceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020, pp. 220–231, 2020, doi: 10.1145/3379597.3387457.[5] F. Palomba, M. Zanoni, F. A. Fontana, A. De Lucia, and R. Oliveto, “Smells like teen spirit: Improving bug prediction performance using the intensity of code smells,” in Proceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Institute of Electrical and Electronics Engineers Inc., Jan. 2017, pp. 244–255. doi: 10.1109/ICSME.2016.27.[6] J. Bansiya and C. G. Davis, “A hierarchical model for object-oriented design quality assessment,” IEEE Transactions on Software Engineering, vol. 28, no. 1, pp. 4–17, Jan. 2002, doi: 10.1109/32.979986.[7] M. Zakeri-Nasrabadi and S. Parsa, “Learning to predict test effectiveness,” International Journal of Intelligent Systems, 2021, doi: 10.1002/INT.22722.[8] M. Zakeri-Nasrabadi and S. Parsa, “Testability Prediction Dataset,” Mar. 2021, doi: 10.5281/ZENODO.4650228.

  15. CodeOntology OpenJDK8 Dataset

    • springernature.figshare.com
    • data.niaid.nih.gov
    • +3more
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia Atzeni; Maurizio Atzori (2023). CodeOntology OpenJDK8 Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5234878
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mattia Atzeni; Maurizio Atzori
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset extracted from the source code of OpenJDK 8: http://openjdk.java.net/, generated by using the CodeOntology parser.This dataset is a breakdown in 4 different files of the dataset at: https://doi.org/10.5281/zenodo.579977structuralInformation.nt - Structural information on source code: 1981108 triplesannotations.nt - DBpedia links: 309688 triplessourceCodeLiterals.nt - Actual source code as literals: 134757 triplescomments.nt - Literal Comments: 105881 triplesThe dataset includes different kinds of triples: structural information extracted from source code, DBpedia links generated from javadoc comments, actual source code as literals and literal comments.Background:The associated publication describes the development of CodeOntology as a community-shared software framework supporting expressive queries over source code. This dataset is the product of the CodeOntology parser, which is able to analyze Java source code and serialize it into RDF triples, applied to the source code of OpenJDK 8, gathering a structured dataset consisting of more than 2 million RDF triples. CodeOntology allows the generation of Linked Data from any Java project, thereby enabling the execution of highly expressive queries over source code, by means of a powerful language like SPARQL.A tutorial video is available at https://youtu.be/bd6pvUDy8kAMore information at the CodeOntology website: http://codeontology.org/

  16. I

    Indonesia Water Statistic: Consumption: Weat Java

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Indonesia Water Statistic: Consumption: Weat Java [Dataset]. https://www.ceicdata.com/en/indonesia/water-consumption/water-statistic-consumption-weat-java
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2005 - Dec 1, 2017
    Area covered
    Indonesia
    Variables measured
    Materials Consumption
    Description

    Indonesia Water Statistic: Consumption: Weat Java data was reported at 1,710,489.000 IDR mn in 2017. This records an increase from the previous number of 1,574,895.000 IDR mn for 2015. Indonesia Water Statistic: Consumption: Weat Java data is updated yearly, averaging 398,069.500 IDR mn from Dec 1995 (Median) to 2017, with 22 observations. The data reached an all-time high of 1,710,489.000 IDR mn in 2017 and a record low of 114,910.000 IDR mn in 1995. Indonesia Water Statistic: Consumption: Weat Java data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Global Database’s Indonesia – Table ID.RIG002: Water Consumption.

  17. h

    new-java-data-41k

    • huggingface.co
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thieu Luu (2024). new-java-data-41k [Dataset]. https://huggingface.co/datasets/echodrift/new-java-data-41k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Authors
    Thieu Luu
    Description

    echodrift/new-java-data-41k dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. d

    Java Ocean Atlas - Reid/Mantyla Section Data, a Library of More than 2000...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). Java Ocean Atlas - Reid/Mantyla Section Data, a Library of More than 2000 Oceanographic Sections Developed from the Cruises Used in the Reid/Mantyla Pre-WOCE Data Set (NCEI Accession 0001456) [Dataset]. https://catalog.data.gov/dataset/java-ocean-atlas-reid-mantyla-section-data-a-library-of-more-than-2000-oceanographic-sections-d
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    (Point of Contact)
    Description

    This dataset includes data from approximately 12,000 stations that J. L. Reid and A. W. Mantyla have used in various world ocean studies. These data have been accumulated for the purpose of global ocean studies and are not intended for fine scale analyses. Each station represents the best station available for that locality at the time of the selection. The set was compiled over many years and from many sources and has been brought up to date as new data have become available. Most of the data were obtained from the National Oceanographic Data Center (NODC). The others came directly from various P.I.s in various formats and may lack some NODC parameters such as ship, country, and institution codes and NODC accession number. It should be noted that these are edited data files and an accurate account of deletions and corrections is, unfortunately, not available. In some cases these data may not agree exactly with versions published later or data supplied later by the NODC or an originator. Only stations that reach close to the bottom were chosen. This means, unfortunately, that the set is rather sparse near the equator. It is believed that the temperature and salinity measurements are acceptable. However, some of the oxygen and nutrient data are quite poor. They have not been eliminated from the data set, but simply ignore them in hand-contouring. They would have been eliminated if the troubles they cause in computer-contouring or instant atlases has been understood, but this set was begun before such methods were generally available. A few known systematic errors such as IGY oxygens or early Discovery oxygens and silicates have been adjusted, based upon deep comparisons with more modern data. In a few localities, stations have been reoccupied many times and a mean composite profile is given; at other localities, only the most recent, or the best sampled profile is saved and all others deleted. Because of the large-scale scope intended for this data array, some closely-spaced stations have been omitted. When needed, those stations can be retrieved from tapes of the entire cruise.

  19. I

    Indonesia Water Statistic: Consumption: Central Java

    • ceicdata.com
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2022). Indonesia Water Statistic: Consumption: Central Java [Dataset]. https://www.ceicdata.com/en/indonesia/water-consumption/water-statistic-consumption-central-java
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2005 - Dec 1, 2017
    Area covered
    Indonesia
    Variables measured
    Materials Consumption
    Description

    Indonesia Water Statistic: Consumption: Central Java data was reported at 1,085,146.000 IDR mn in 2017. This records a decrease from the previous number of 1,090,617.000 IDR mn for 2015. Indonesia Water Statistic: Consumption: Central Java data is updated yearly, averaging 279,042.500 IDR mn from Dec 1995 (Median) to 2017, with 22 observations. The data reached an all-time high of 1,090,617.000 IDR mn in 2015 and a record low of 63,310.000 IDR mn in 1995. Indonesia Water Statistic: Consumption: Central Java data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Global Database’s Indonesia – Table ID.RIG002: Water Consumption.

  20. d

    Data from: FOUNTAIN: A JAVA open-source package to assist large sequencing...

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). FOUNTAIN: A JAVA open-source package to assist large sequencing projects [Dataset]. https://catalog.data.gov/dataset/fountain-a-java-open-source-package-to-assist-large-sequencing-projects
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Neilsberg Research (2025). Java, SD Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b23b79ff-f25d-11ef-8c1b-3860777c1fe6/

Java, SD Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition

Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
South Dakota, Java
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Java by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Java across both sexes and to determine which sex constitutes the majority.

Key observations

There is a considerable majority of female population, with 65.66% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

  • Gender: This column displays the Gender (Male / Female)
  • Population: The population of the gender in the Java is shown in this column.
  • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Java total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Java Population by Race & Ethnicity. You can refer the same here

Search
Clear search
Close search
Google apps
Main menu