100+ datasets found
  1. N

    Mercedes, TX Population Breakdown By Race (Excluding Ethnicity) Dataset:...

    • neilsberg.com
    csv, json
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Mercedes, TX Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2e2c253e-230c-11ef-bd92-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Texas, Mercedes
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Mercedes by race. It includes the population of Mercedes across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Mercedes across relevant racial categories.

    Key observations

    The percent distribution of Mercedes population by race (across all racial categories recognized by the U.S. Census Bureau): 56.37% are white, 0.04% are Asian, 0.02% are Native Hawaiian and other Pacific Islander, 12.11% are some other race and 31.47% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Mercedes
    • Population: The population of the racial category (excluding ethnicity) in the Mercedes is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Mercedes total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Mercedes Population by Race & Ethnicity. You can refer the same here

  2. N

    Garfield, NJ Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Garfield, NJ Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/garfield-nj-population-by-race/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Jersey, Garfield
    Variables measured
    Non-Hispanic Asian Population, Non-Hispanic Black Population, Non-Hispanic White Population, Non-Hispanic Some other race Population, Non-Hispanic Two or more races Population, Non-Hispanic American Indian and Alaska Native Population, Non-Hispanic Native Hawaiian and Other Pacific Islander Population, Non-Hispanic Asian Population as Percent of Total Non-Hispanic Population, Non-Hispanic Black Population as Percent of Total Non-Hispanic Population, Non-Hispanic White Population as Percent of Total Non-Hispanic Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Non-Hispanic population and (b) population as a percentage of the total Non-Hispanic population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and are part of Non-Hispanic classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Non-Hispanic population of Garfield by race. It includes the distribution of the Non-Hispanic population of Garfield across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of Garfield across relevant racial categories.

    Key observations

    Of the Non-Hispanic population in Garfield, the largest racial group is White alone with a population of 14,422 (81.32% of the total Non-Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (for Non-Hispanic) for the Garfield
    • Population: The population of the racial category (for Non-Hispanic) in the Garfield is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Garfield total Non-Hispanic population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Garfield Population by Race & Ethnicity. You can refer the same here

  3. N

    Ohio, New York Non-Hispanic Population Breakdown By Race Dataset:...

    • neilsberg.com
    csv, json
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Ohio, New York Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e10ae57d-2310-11ef-bd92-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ohio
    Variables measured
    Non-Hispanic Asian Population, Non-Hispanic Black Population, Non-Hispanic White Population, Non-Hispanic Some other race Population, Non-Hispanic Two or more races Population, Non-Hispanic American Indian and Alaska Native Population, Non-Hispanic Native Hawaiian and Other Pacific Islander Population, Non-Hispanic Asian Population as Percent of Total Non-Hispanic Population, Non-Hispanic Black Population as Percent of Total Non-Hispanic Population, Non-Hispanic White Population as Percent of Total Non-Hispanic Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Non-Hispanic population and (b) population as a percentage of the total Non-Hispanic population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and are part of Non-Hispanic classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Non-Hispanic population of Ohio town by race. It includes the distribution of the Non-Hispanic population of Ohio town across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of Ohio town across relevant racial categories.

    Key observations

    With a zero Hispanic population, Ohio town is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 838 (98.01% of the total Non-Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (for Non-Hispanic) for the Ohio town
    • Population: The population of the racial category (for Non-Hispanic) in the Ohio town is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Ohio town total Non-Hispanic population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Ohio town Population by Race & Ethnicity. You can refer the same here

  4. P

    BIG Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang, BIG Dataset [Dataset]. https://paperswithcode.com/dataset/big
    Explore at:
    Authors
    Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang
    Description

    A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048×1600 to 5000×3600. Every image in the dataset has been carefully labeled by a professional while keeping the same guidelines as PASCAL VOC 2012 without the void region.

  5. Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv, zip
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
    
    This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.
    
    The datasets are available under directory dataset. There are 4 datasets in this directory. 
    
    1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system. 
    2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
    3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
    4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.
    
    In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.
    
    The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.
    
    More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).
    
    References:
    
    1. GumTree
    
    * https://github.com/GumTreeDiff/gumtree
    
    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324
    
    2. PyDriller
    
    * https://pydriller.readthedocs.io/en/latest/
    
    * Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911
    
    
  6. i

    DIDA: The largest historical handwritten digit dataset with 250k digits

    • ieee-dataport.org
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Yavariabdi (2022). DIDA: The largest historical handwritten digit dataset with 250k digits [Dataset]. https://ieee-dataport.org/documents/dida-largest-historical-handwritten-digit-dataset-250k-digits
    Explore at:
    Dataset updated
    Jan 6, 2022
    Authors
    Amir Yavariabdi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    250

  7. Data from: San Francisco Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    DataSF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    San Francisco
    Description

    Context

    DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

    https://datasf.org/about/

    Content

    This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

    • This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
    • This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
    • This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
    • This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    http://datasf.org/

    Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @meric from Unplash.

    Inspiration

    Which neighborhoods have the highest proportion of offensive graffiti?

    Which complaint is most likely to be made using Twitter and in which neighborhood?

    What are the most complained about Muni stops in San Francisco?

    What are the top 10 incident types that the San Francisco Fire Department responds to?

    How many medical incidents and structure fires are there in each neighborhood?

    What’s the average response time for each type of dispatched vehicle?

    Which category of police incidents have historically been the most common in San Francisco?

    What were the most common police incidents in the category of LARCENY/THEFT in 2016?

    Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

    What is the average tree diameter?

    What is the highest number of a particular species of tree planted in a single year?

    Which San Francisco locations feature the largest number of trees?

  8. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  9. P

    Data from: ImageNet Dataset

    • paperswithcode.com
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2024). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
    Explore at:
    Dataset updated
    Apr 15, 2024
    Authors
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
    Description

    The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

    Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million

  10. Z

    MoreFixes: Largest CVE dataset with fixes

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Oct 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akhoundali, Jafar (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Rietveld, Kristian F. D.
    GADYATSKAYA, Olga
    Rahim Nouri, Sajad
    Akhoundali, Jafar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

    Please use this for citation:

     title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
     author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
     booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
     pages={42--51},
     year={2024}
    }
    
  11. o

    US Cities: Demographics

    • public.opendatasoft.com
    • data.smartidf.services
    • +3more
    csv, excel, json
    Updated Jul 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). US Cities: Demographics [Dataset]. https://public.opendatasoft.com/explore/dataset/us-cities-demographics/
    Explore at:
    excel, csv, jsonAvailable download formats
    Dataset updated
    Jul 27, 2017
    License

    https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain

    Area covered
    United States
    Description

    This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. This data comes from the US Census Bureau's 2015 American Community Survey. This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.

  12. P

    MassiveText Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack W. Rae; Sebastian Borgeaud; Trevor Cai; Katie Millican; Jordan Hoffmann; Francis Song; John Aslanides; Sarah Henderson; Roman Ring; Susannah Young; Eliza Rutherford; Tom Hennigan; Jacob Menick; Albin Cassirer; Richard Powell; George van den Driessche; Lisa Anne Hendricks; Maribeth Rauh; Po-Sen Huang; Amelia Glaese; Johannes Welbl; Sumanth Dathathri; Saffron Huang; Jonathan Uesato; John Mellor; Irina Higgins; Antonia Creswell; Nat McAleese; Amy Wu; Erich Elsen; Siddhant Jayakumar; Elena Buchatskaya; David Budden; Esme Sutherland; Karen Simonyan; Michela Paganini; Laurent SIfre; Lena Martens; Xiang Lorraine Li; Adhiguna Kuncoro; Aida Nematzadeh; Elena Gribovskaya; Domenic Donato; Angeliki Lazaridou; Arthur Mensch; Jean-Baptiste Lespiau; Maria Tsimpoukelli; Nikolai Grigorev; Doug Fritz; Thibault Sottiaux; Mantas Pajarskas; Toby Pohlen; Zhitao Gong; Daniel Toyama; Cyprien de Masson d'Autume; Yujia Li; Tayfun Terzi; Vladimir Mikulik; Igor Babuschkin; Aidan Clark; Diego de Las Casas; Aurelia Guy; Chris Jones; James Bradbury; Matthew Johnson; Blake Hechtman; Laura Weidinger; Iason Gabriel; William Isaac; Ed Lockhart; Simon Osindero; Laura Rimell; Chris Dyer; Oriol Vinyals; Kareem Ayoub; Jeff Stanway; Lorrayne Bennett; Demis Hassabis; Koray Kavukcuoglu; Geoffrey Irving (2024). MassiveText Dataset [Dataset]. https://paperswithcode.com/dataset/massivetext
    Explore at:
    Dataset updated
    Feb 14, 2024
    Authors
    Jack W. Rae; Sebastian Borgeaud; Trevor Cai; Katie Millican; Jordan Hoffmann; Francis Song; John Aslanides; Sarah Henderson; Roman Ring; Susannah Young; Eliza Rutherford; Tom Hennigan; Jacob Menick; Albin Cassirer; Richard Powell; George van den Driessche; Lisa Anne Hendricks; Maribeth Rauh; Po-Sen Huang; Amelia Glaese; Johannes Welbl; Sumanth Dathathri; Saffron Huang; Jonathan Uesato; John Mellor; Irina Higgins; Antonia Creswell; Nat McAleese; Amy Wu; Erich Elsen; Siddhant Jayakumar; Elena Buchatskaya; David Budden; Esme Sutherland; Karen Simonyan; Michela Paganini; Laurent SIfre; Lena Martens; Xiang Lorraine Li; Adhiguna Kuncoro; Aida Nematzadeh; Elena Gribovskaya; Domenic Donato; Angeliki Lazaridou; Arthur Mensch; Jean-Baptiste Lespiau; Maria Tsimpoukelli; Nikolai Grigorev; Doug Fritz; Thibault Sottiaux; Mantas Pajarskas; Toby Pohlen; Zhitao Gong; Daniel Toyama; Cyprien de Masson d'Autume; Yujia Li; Tayfun Terzi; Vladimir Mikulik; Igor Babuschkin; Aidan Clark; Diego de Las Casas; Aurelia Guy; Chris Jones; James Bradbury; Matthew Johnson; Blake Hechtman; Laura Weidinger; Iason Gabriel; William Isaac; Ed Lockhart; Simon Osindero; Laura Rimell; Chris Dyer; Oriol Vinyals; Kareem Ayoub; Jeff Stanway; Lorrayne Bennett; Demis Hassabis; Koray Kavukcuoglu; Geoffrey Irving
    Description

    MassiveText is a collection of large English-language text datasets from multiple sources: web pages, books, news articles, and code. The data pipeline includes text quality filtering, removal of repetitious text, deduplication of similar documents, and removal of documents with significant test-set overlap. MassiveText contains 2.35 billion documents or about 10.5 TB of text.

    Usage: Gopher is trained on 300B tokens (12.8% of the tokens in the dataset), so the authors sub-sample from MassiveText with sampling proportions specified per subset (books, news, etc.). These sampling proportions are tuned to maximize downstream performance. The largest sampling subset is the curated web-text corpus MassiveWeb, which is found to improve downstream performance relative to existing web-text datasets such as C4 (Raffel et al., 2020).

    Find Datasheets in the Gopher paper.

  13. o

    School information and student demographics

    • data.ontario.ca
    • datasets.ai
    • +1more
    xlsx
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education (2025). School information and student demographics [Dataset]. https://data.ontario.ca/dataset/school-information-and-student-demographics
    Explore at:
    xlsx(1565910), xlsx(1550796), xlsx(1566878), xlsx(1565304), xlsx(1562805), xlsx(1459001), xlsx(1462006), xlsx(1460629), xlsx(1500842), xlsx(1482917), xlsx(1547704), xlsx(1567330), xlsx(1580734), xlsx(1462064)Available download formats
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Education
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Jun 6, 2025
    Area covered
    Ontario
    Description

    Data includes: board and school information, grade 3 and 6 EQAO student achievements for reading, writing and mathematics, and grade 9 mathematics EQAO and OSSLT. Data excludes private schools, Education and Community Partnership Programs (ECPP), summer, night and continuing education schools.

    How Are We Protecting Privacy?

    Results for OnSIS and Statistics Canada variables are suppressed based on school population size to better protect student privacy. In order to achieve this additional level of protection, the Ministry has used a methodology that randomly rounds a percentage either up or down depending on school enrolment. In order to protect privacy, the ministry does not publicly report on data when there are fewer than 10 individuals represented.

      * Percentages depicted as 0 may not always be 0 values as in certain situations the values have been randomly rounded down or there are no reported results at a school for the respective indicator. * Percentages depicted as 100 are not always 100, in certain situations the values have been randomly rounded up.
    The school enrolment totals have been rounded to the nearest 5 in order to better protect and maintain student privacy.

    The information in the School Information Finder is the most current available to the Ministry of Education at this time, as reported by schools, school boards, EQAO and Statistics Canada. The information is updated as frequently as possible.

    This information is also available on the Ministry of Education's School Information Finder website by individual school.

    Descriptions for some of the data types can be found in our glossary.

    School/school board and school authority contact information are updated and maintained by school boards and may not be the most current version. For the most recent information please visit: https://data.ontario.ca/dataset/ontario-public-school-contact-information.

  14. Data from: A large-scale fMRI dataset for human action recognition

    • openneuro.org
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming Zhou; Zhengxin Gong; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen (2023). A large-scale fMRI dataset for human action recognition [Dataset]. http://doi.org/10.18112/openneuro.ds004488.v1.1.1
    Explore at:
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Ming Zhou; Zhengxin Gong; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary

    Human action recognition is one of our critical living abilities, allowing us to interact easily with the environment and others in everyday life. Although the neural basis of action recognition has been widely studied using a few categories of actions from simple contexts as stimuli, how the human brain recognizes diverse human actions in real-world environments still need to be explored. Here, we present the Human Action Dataset (HAD), a large-scale functional magnetic resonance imaging (fMRI) dataset for human action recognition. HAD contains fMRI responses to 21,600 video clips from 30 participants. The video clips encompass 180 human action categories and offer a comprehensive coverage of complex activities in daily life. We demonstrate that the data are reliable within and across participants and, notably, capture rich representation information of the observed human actions. This extensive dataset, with its vast number of action categories and exemplars, has the potential to deepen our understanding of human action recognition in natural environments.

    Data record

    The data were organized according to the Brain-Imaging-Data-Structure (BIDS) Specification version 1.7.0 and can be accessed from the OpenNeuro public repository (accession number: ds004488). The raw data of each subject were stored in "sub-< ID>" directories. The preprocessed volume data and the derived surface-based data were stored in “derivatives/fmriprep” and “derivatives/ciftify” directories, respectively. The video clips stimuli were stored in “stimuli” directory.

    Video clips stimuli The video clips stimuli selected from HACS are deposited in the "stimuli" folder. Each of the 180 action categories holds a folder in which 120 unique video clips are stored.

    Raw data The data for each participant are distributed in three sub-folders, including the “anat” folder for the T1 MRI data, the “fmap” folder for the field map data, and the “func” folder for functional MRI data. The events file in “func” folder contains the onset, duration, trial type (category index) in specific scanning run.

    Preprocessed volume data from fMRIprep The preprocessed volume-based fMRI data are in subject's native space, saved as “sub-

    Preprocessed surface data from ciftify Under the “results” folder, the preprocessed surface-based data are saved in standard fsLR space, named as “sub-

  15. w

    Immigration system statistics data tables

    • gov.uk
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    GOV.UK
    Authors
    Home Office
    Description

    List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

    If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Accessible file formats

    The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
    If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
    Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Immigration system statistics, year ending March 2025
    Immigration system statistics quarterly release
    Immigration system statistics user guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Passenger arrivals

    https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)

    ‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

    Electronic travel authorisation

    https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
    ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)

    https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
    Vis_D01: Entry clearance visa applications, by nationality and visa type
    Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

    Additional dat

  16. h

    Emilia-Dataset

    • huggingface.co
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amphion (2024). Emilia-Dataset [Dataset]. https://huggingface.co/datasets/amphion/Emilia-Dataset
    Explore at:
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    Amphion
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline.

      News 🔥
    

    2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.

  17. Top 3000+ Cryptocurrency Dataset

    • kaggle.com
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Top 3000+ Cryptocurrency Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/cryptocurrency-dataset-2021-395-types-of-crypto
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.

    Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database

    A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.

    Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

    Content

    This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023

    Structure of the Dataset

    https://i.imgur.com/qGVJaHl.png" alt="">

    Acknowledgements

    This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.

    Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/

  18. N

    Illinois Hispanic or Latino Population Distribution by Ancestries Dataset :...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Illinois Hispanic or Latino Population Distribution by Ancestries Dataset : Detailed Breakdown of Hispanic or Latino Origins // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/illinois-population-by-race/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Illinois
    Variables measured
    Hispanic or Latino population with Cuban ancestry, Hispanic or Latino population with Mexican ancestry, Hispanic or Latino population with Puerto Rican ancestry, Hispanic or Latino population with Other Hispanic or Latino ancestry, Hispanic or Latino population with Cuban ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Mexican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Puerto Rican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Other Hispanic or Latino ancestry as Percent of Total Hispanic Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Origin / Ancestry for Hispanic population and (b) respective population as a percentage of the total Hispanic population, we initially analyzed and categorized the data for each of the ancestries across the Hispanic or Latino population. It is ensured that the population estimates used in this dataset pertain exclusively to ancestries for the Hispanic or Latino population. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Illinois Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of Illinois, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of Illinois.

    Key observations

    Among the Hispanic population in Illinois, regardless of the race, the largest group is of Mexican origin, with a population of 1.8 million (76.83% of the total Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Origin for Hispanic or Latino population include:

    • Mexican
    • Puerto Rican
    • Cuban
    • Other Hispanic or Latino

    Variables / Data Columns

    • Origin: This column displays the origin for Hispanic or Latino population for the Illinois
    • Population: The population of the specific origin for Hispanic or Latino population in the Illinois is shown in this column.
    • % of Total Hispanic Population: This column displays the percentage distribution of each Hispanic origin as a proportion of Illinois total Hispanic or Latino population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Illinois Population by Race & Ethnicity. You can refer the same here

  19. h

    the-stack

    • huggingface.co
    • opendatalab.com
    Updated Oct 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2022). the-stack [Dataset]. https://huggingface.co/datasets/bigcode/the-stack
    Explore at:
    Dataset updated
    Oct 27, 2022
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for The Stack

      Changelog
    

    Release Description

    v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset is 3TB in size.

    v1.1 The three copyleft licenses ((MPL/EPL/LGPL) were excluded and the list of permissive licenses extended to 193 licenses in total. The list of programming languages… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack.

  20. P

    Cora Dataset

    • paperswithcode.com
    • huggingface.co
    • +1more
    Updated Jul 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew McCallum; Kamal Nigam; Jason Rennie; Kristie Seymore (2020). Cora Dataset [Dataset]. https://paperswithcode.com/dataset/cora
    Explore at:
    Dataset updated
    Jul 6, 2020
    Authors
    Andrew McCallum; Kamal Nigam; Jason Rennie; Kristie Seymore
    Description

    The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Neilsberg Research (2024). Mercedes, TX Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2e2c253e-230c-11ef-bd92-3860777c1fe6/

Mercedes, TX Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition

Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 7, 2024
Dataset authored and provided by
Neilsberg Research
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Texas, Mercedes
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Mercedes by race. It includes the population of Mercedes across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Mercedes across relevant racial categories.

Key observations

The percent distribution of Mercedes population by race (across all racial categories recognized by the U.S. Census Bureau): 56.37% are white, 0.04% are Asian, 0.02% are Native Hawaiian and other Pacific Islander, 12.11% are some other race and 31.47% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

Racial categories include:

  • White
  • Black or African American
  • American Indian and Alaska Native
  • Asian
  • Native Hawaiian and Other Pacific Islander
  • Some other race
  • Two or more races (multiracial)

Variables / Data Columns

  • Race: This column displays the racial categories (excluding ethnicity) for the Mercedes
  • Population: The population of the racial category (excluding ethnicity) in the Mercedes is shown in this column.
  • % of Total Population: This column displays the percentage distribution of each race as a proportion of Mercedes total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Mercedes Population by Race & Ethnicity. You can refer the same here

Search
Clear search
Close search
Google apps
Main menu