51 datasets found

N
Grass Range, MT Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235d521-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Grass Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Grass Range across both sexes and to determine which sex constitutes the majority.

Key observations

There is a considerable majority of female population, with 71.13% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the Grass Range is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of Grass Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range Population by Race & Ethnicity. You can refer the same here
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
A
NCHS - Percent Distribution of Births for Females by Age Group: United...
data.amerigeoss.org
healthdata.gov
+7more
csv, json, rdf, xml
Updated Jun 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2018). NCHS - Percent Distribution of Births for Females by Age Group: United States [Dataset]. https://data.amerigeoss.org/dataset/37aaa48b-5ba4-4ef4-8895-2d21b7697aa4
Explore at:
rdf, xml, json, csvAvailable download formats
Dataset updated
Jun 4, 2018
Dataset provided by
United States
Area covered
United States
Description
This dataset includes percent distribution of births for females by age group in the United States since 1933.

The number of states in the reporting area differ historically. In 1915 (when the birth registration area was established), 10 states and the District of Columbia reported births; by 1933, 48 states and the District of Columbia were reporting births, with the last two states, Alaska and Hawaii, added to the registration area in 1959 and 1960, when these regions gained statehood. Reporting area information is detailed in references 1 and 2 below.

SOURCES

CDC/NCHS, National Vital Statistics System, birth data (see http://www.cdc.gov/nchs/births.htm); public-use data files (see http://www.cdc.gov/nchs/data_access/VitalStatsOnline.htm); and CDC WONDER (see http://wonder.cdc.gov/).

REFERENCES

National Office of Vital Statistics. Vital Statistics of the United States, 1950, Volume I. 1954. Available from: http://www.cdc.gov/nchs/data/vsus/vsus_1950_1.pdf.

Hetzel AM. U.S. vital statistics system: major activities and developments, 1950-95. National Center for Health Statistics. 1997. Available from: http://www.cdc.gov/nchs/data/misc/usvss.pdf.

National Center for Health Statistics. Vital Statistics of the United States, 1967, Volume I–Natality. 1967. Available from http://www.cdc.gov/nchs/data/vsus/nat67_1.pdf.

Martin JA, Hamilton BE, Osterman MJK, et al. Births: Final data for 2015. National vital statistics reports; vol 66 no 1. Hyattsville, MD: National Center for Health Statistics. 2017. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_01.pdf.
f
Dataset for: Quantifying how diagnostic test accuracy depends on threshold...
wiley.figshare.com
search.datacite.org
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hayley Elizabeth Jones; Constantine Gatsonis; Thomas A Trikalinos; Nicky J Welton; Tony Ades (2023). Dataset for: Quantifying how diagnostic test accuracy depends on threshold in a meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8267015.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8267015.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Hayley Elizabeth Jones; Constantine Gatsonis; Thomas A Trikalinos; Nicky J Welton; Tony Ades
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Tests for disease often produce a continuous measure, such as the concentration of some biomarker in a blood sample. In clinical practice, a threshold C is selected such that results, say, greater than C are declared positive, and those less than C negative. Measures of test accuracy such as sensitivity and specificity depend crucially on C, and the optimal value of this threshold is usually a key question for clinical practice. Standard methods for meta-analysis of test accuracy (i) do not provide summary estimates of accuracy at each threshold, precluding selection of the optimal threshold, and further (ii) do not make use of all available data. We describe a multinomial meta-analysis model that can take any number of pairs of sensitivity and specificity from each study and explicitly quantifies how accuracy depends on C. Our model assumes that some pre-specified or Box-Cox transformation of test results in the diseased and disease-free populations has a logistic distribution. The Box-Cox transformation parameter can be estimated from the data, allowing for a flexible range of underlying distributions. We parameterise in terms of the means and scale parameters of the two logistic distributions. In addition to credible intervals for the pooled sensitivity and specificity across all thresholds, we produce prediction intervals, allowing for between-study heterogeneity in all parameters. We demonstrate the model using two case study meta-analyses, examining the accuracy of tests for acute heart failure and pre-eclampsia. We show how the model can be extended to explore reasons for heterogeneity using study-level covariates.
o
Data from: A 24-hour dynamic population distribution dataset based on mobile...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2021). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. http://doi.org/10.5281/zenodo.4724388
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4724388
Dataset updated
Apr 28, 2021
Authors
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
Area covered
Helsinki Metropolitan Area, Finland
Description
Related article: Bergroth, C., J��rv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. In this dataset: We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon �� Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning. Please cite this dataset as: Bergroth, C., J��rv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4 Organization of data The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files: HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area. HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area. HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area. target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS. Column names YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute. H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as ��Hx��, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period) In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets. License Creative Commons Attribution 4.0 International. Related datasets J��rv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612 Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
f
The ARLs of the proposed chart for , and n = 30.
plos.figshare.com
bin
Updated Aug 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Kamal; Gadde Srinivasa Rao; Meshayil M. Alsolmi; Zubair Ahmad; Ramy Aldallal; Md. Mahabubur Rahman (2023). The ARLs of the proposed chart for , and n = 30. [Dataset]. http://doi.org/10.1371/journal.pone.0285914.t011
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285914.t011
Dataset updated
Aug 17, 2023
Dataset provided by
PLOS ONE
Authors
Mustafa Kamal; Gadde Srinivasa Rao; Meshayil M. Alsolmi; Zubair Ahmad; Ramy Aldallal; Md. Mahabubur Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical methodologies have a wider range of practical applications in every applied sector including education, reliability, management, hydrology, and healthcare sciences. Among the mentioned sectors, the implementation of statistical models in health sectors is very crucial. In the recent era, researchers have shown a deep interest in using the trigonometric function to develop new statistical methodologies. In this article, we propose a new statistical methodology using the trigonometric function, namely, a new trigonometric sine-G family of distribution. A subcase (special member) of the new trigonometric sine-G method called a new trigonometric sine-Weibull distribution is studied. The estimators of the new trigonometric sine-Weibull distribution are derived. A simulation study of the new trigonometric sine-Weibull distribution is also provided. The applicability of the new trigonometric sine-Weibull distribution is shown by considering a data set taken from the biomedical sector. Furthermore, we introduce an attribute control chart for the lifetime of an entity that follows the new trigonometric sine-Weibull distribution in terms of the number of failure items before a fixed time period is investigated. The performance of the suggested chart is investigated using the average run length. A comparative study and real example are given for the proposed control chart. Based on our study of the existing literature, we did not find any published work on the development of a control chart using new probability distributions that are developed based on the trigonometric function. This surprising gap is a key and interesting motivation of this research.
G
Distribution of Physicians and Allied Health Practitioners by Gross Payment...
open.canada.ca
ouvert.canada.ca
+1more
html, xlsx
Updated Nov 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Alberta (2024). Distribution of Physicians and Allied Health Practitioners by Gross Payment Range [Dataset]. https://open.canada.ca/data/en/dataset/1b577abd-0934-4432-84c9-738061b0fa9b
Explore at:
xlsx, htmlAvailable download formats
Dataset updated
Nov 13, 2024
Dataset provided by
Government of Alberta
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Apr 1, 2006 - Mar 31, 2022
Description
This table provides a comparison of annual statistics on Distribution of Physicians and Allied health Practitioners, by Gross Payment Range, based on fee-for-service payments under the Alberta Health Care Insurance Plan (AHCIP). This table is an Excel version of a table in the “Alberta Health Care Insurance Plan Statistical Supplement” report published annually by Alberta Health.
Distribution of Individual Registrants Covered by Local Geographic Area...
open.canada.ca
datasets.ai
+1more
html, xlsx
Updated Nov 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Alberta (2024). Distribution of Individual Registrants Covered by Local Geographic Area (LGA) [Dataset]. https://open.canada.ca/data/en/dataset/117e72cd-9997-48e4-9766-93141efe22a5
Explore at:
xlsx, htmlAvailable download formats
Dataset updated
Nov 13, 2024
Dataset provided by
Government of Albertahttps://www.alberta.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Apr 1, 2014 - Mar 31, 2022
Description
This table provides statistics on Distribution of Individual Registrants Covered by Local Geographic Area under the Alberta Health Care Insurance Plan (AHCIP). This table is an Excel version of a table in the "Alberta Health Care Insurance Statistical Supplement" report published annually by Alberta Health.
G
Population Distribution 1976
open.canada.ca
datasets.ai
+1more
jpg, pdf
Updated Mar 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2022). Population Distribution 1976 [Dataset]. https://open.canada.ca/data/en/dataset/97108f1b-133c-5dbc-bc82-f273d5399e90
Explore at:
pdf, jpgAvailable download formats
Dataset updated
Mar 14, 2022
Dataset provided by
Natural Resources Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Contained within the 5th Edition (1978 to 1995) of the National Atlas of Canada is a map that shows places of 5 000+ by proportional circles, smaller places by symbols and census urban areas by colour for Canada in 1976. An inset map shows the area from Windsor to Moncton.
Gender, Age, and Emotion Detection from Voice
kaggle.com
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/datasets/rohitzaman/gender-age-and-emotion-detection-from-voice/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
f
The computed sample size, simulated half-width and errors for the 95%...
plos.figshare.com
xls
Updated Jun 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gwowen Shieh (2023). The computed sample size, simulated half-width and errors for the 95% reference intervals of the distribution proportion 0.90. [Dataset]. http://doi.org/10.1371/journal.pone.0278447.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0278447.t003
Dataset updated
Jun 19, 2023
Dataset provided by
PLOS ONE
Authors
Gwowen Shieh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The computed sample size, simulated half-width and errors for the 95% reference intervals of the distribution proportion 0.90.
o
Game Feedback Discord Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Game Feedback Discord Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c8ccbb59-2931-4b87-adef-d604fb0774b0
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset captures suggestions for improving a game, sourced from a Discord server where members can submit their ideas. Users have the ability to upvote these suggestions, and any suggestion accumulating 35 or more upvotes is forwarded to the game's developers. The primary purpose of compiling this dataset was for natural language processing (NLP) practice, but it also offers opportunities for applying statistical analysis to understand factors that contribute to a suggestion being sent to the developers. The dataset provides valuable insights into community feedback and engagement.

Columns

SuggestionDate: The date on which the suggestion was originally made.

SuggestionText: The full text content of the suggestion submitted by a Discord server member.

CharCount: A numerical count of the characters within the 'SuggestionText' field.

SuggestionCategory: A categorisation for the suggestion; further details on these categories would typically be found in a more expansive dataset description.

Upvotes: The total number of upvotes received by the suggestion from other members of the Discord server.

ReportedToDevs: A boolean indicator specifying whether the suggestion was reported to the game development team.

Distribution

The dataset is typically provided in a tabular format, such as a CSV file. It contains a total of 158 individual records or rows, each representing a unique game improvement suggestion. The data includes suggestion dates ranging from 1st April 2022 to 26th April 2022. The character count for suggestions varies widely, from 1 to 1831 characters. Categories include 'Feature' (53%), 'Item' (32%), and 'Other' (16%). A small percentage of suggestions (16%) were reported to the developers.

Usage

This dataset is ideally suited for various analytical tasks. It can be used for natural language processing (NLP) exercises, such as sentiment analysis of suggestions, topic modelling, or text summarisation. Additionally, it is suitable for statistical analysis to identify correlations between suggestion characteristics (e.g., length, category, keywords) and their likelihood of receiving upvotes or being reported to the developers. Game developers, community managers, and data analysts can utilise this data to gain actionable insights into player feedback and prioritise development efforts.

Coverage

The dataset's geographic coverage is global, as the Discord server from which suggestions were drawn is accessible worldwide. The time range for the suggestions captured spans from 1st April 2022 to 26th April 2022. The demographic scope includes any member of the specific Discord server who submitted a suggestion. There are no specific notes on data availability limitations for particular groups or years within the provided information.

License

CC0

Who Can Use It

Data Scientists & Machine Learning Engineers: For training NLP models on user-generated content or developing predictive models for suggestion virality.

Game Developers: To understand community sentiment, identify popular feature requests, and prioritise game improvements based on player feedback.

Community Managers: To analyse engagement patterns on their Discord servers and improve feedback collection processes.

Data Analysts: For performing statistical tests to determine which types of suggestions have a higher probability of being adopted or forwarded.

Students & Researchers: For academic projects involving text analysis, social media data, or community engagement studies.

Dataset Name Suggestions

Grounded Suggestions via Discord Server

Game Feedback Discord Dataset

Player Suggestion Analysis

Discord Game Improvement Data

Attributes

Original Data Source: Grounded Suggestions via Discord Server
Harmonized Tree Species Occurrence Points for Europe
zenodo.org
application/gzip, bin +1
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Heisig; Johannes Heisig; Tomislav Hengl; Tomislav Hengl (2024). Harmonized Tree Species Occurrence Points for Europe [Dataset]. http://doi.org/10.5281/zenodo.4068253
Explore at:
bin, png, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4068253
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Heisig; Johannes Heisig; Tomislav Hengl; Tomislav Hengl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Europe
Description
This data set is a harmonized collection of existing data from GBIF, the EU-Forest project and the LUCAS survey. It has about 3 million observations and is supplemented by variables (e.g. location accuracy, land cover type, canopy height, etc.) which enable precise filtering for specific user applications.

The RDS file is created from an sf-object and suitable for fast reading in the R-programming environment. The CSV.GZ file contains records as a table with Easting and Northing in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035) and can be fed in a GIS after being unzipped.

The code producing this data set is publicly available on GitLab.

Variables:

id = unique point identifier

easting = x coordinate

northing = y coordinate

country = ISO country code

species = Latin species name

genus = genus name

scientific_name = long species name

gbif_taxon_key = taxon key from GBIF

gbif_genus_key = genus key from GBIF

taxon_rank = species or genus

year = year of observation

accessed_through = database through which data was accessed (GBIF, LUCAS, EU-Forest)

dataset_info = data set name (individual sub-data-set)

citation = DOI citation of the individual data set

license = distribution license

location_accuracy = spatial accuracy of observation (meters)

flag_location_issue = known location issues present

flag_date_issue = known date issues present

eoo = Extent of occurrence (applying the concept of natural geographical range used for the EU-Forest data set (Mauri et al., 2017) to all other data points. 1 = point inside species range; 0 = point outside; NA = EOO polygon not available for this species)

dbh = Diameter Breast Height (only recorded for observations from the EU-Forest data set (Mauri et al., 2017))

lc1 = LUCAS land cover type 1 (only recorded for observations from LUCAS data)

lc2 = LUCAS land cover type 2 (only recorded for observations from LUCAS data)

landmask_country = land mask overlay 30 meters (NA = not on land)

corine = CORINE 2018 land cover type (extracted from the 100 meter raster data set)

nightlights = light pollution observed by VIIRS (proxy for remoteness / distance to human structures)

canopy_height = canopy height derived from GEDI waveform LiDAR point data

natura_2000 = Natura 2000 site code (if a point falls inside a protected area (GIS-layer) this variable contains the site identification code; all sites can be explored on an interactive map)

freq_location = number of points with identical location (in some cases one location has multiple observation, differing in species and/or year. This may lead to difficulties in certain modeling tasks)

geometry = point geometry in ETRS89 / LAEA Europe

See this detailed documentation for more insights into each variable.

If you would like to know more about the creation of this data set, see

the R-Markdown documenting the process (GitLab repository)

the talk at OpenGeoHub Summer School 2020 (Youtube)

Some advice: This data set is a puzzle with pieces from many different sources. Take some time to explore before including it in your work. Use summary statistics to see which variables have NAs and how many. Choose your filtering criteria wisely. For example, some points with the highest location accuracy have no record for the year of observations. You would exclude these, if "year > 1990" was your criteria.

This work has received funding from the European Union's the Innovation and Networks Executive Agency (INEA) under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 (https://ec.europa.eu/inea/en/connecting-europe-facility/cef-telecom/2018-eu-ia-0095).
Z
Data from: A New Bayesian Approach to Increase Measurement Accuracy Using a...
data.niaid.nih.gov
zenodo.org
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dinya, Elek (2025). A New Bayesian Approach to Increase Measurement Accuracy Using a Precision Entropy Indicator [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14417120
Explore at:
Dataset updated
Feb 25, 2025
Dataset provided by
Domjan, Peter
Bertalan, Adam
Vingender, Istvan
Dinya, Elek
Angyal, Viola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"We believe that by accounting for the inherent uncertainty in the system during each measurement, the relationship between cause and effect can be assessed more accurately, potentially reducing the duration of research."

Short description

This dataset was created as part of a research project investigating the efficiency and learning mechanisms of a Bayesian adaptive search algorithm supported by the Imprecision Entropy Indicator (IEI) as a novel method. It includes detailed statistical results, posterior probability values, and the weighted averages of IEI across multiple simulations aimed at target localization within a defined spatial environment. Control experiments, including random search, random walk, and genetic algorithm-based approaches, were also performed to benchmark the system's performance and validate its reliability.

The task involved locating a target area centered at (100; 100) within a radius of 10 units (Research_area.png), inside a circular search space with a radius of 100 units. The search process continued until 1,000 successful target hits were achieved.

To benchmark the algorithm's performance and validate its reliability, control experiments were conducted using alternative search strategies, including random search, random walk, and genetic algorithm-based approaches. These control datasets serve as baselines, enabling comprehensive comparisons of efficiency, randomness, and convergence behavior across search methods, thereby demonstrating the effectiveness of our novel approach.

Uploaded files

The first dataset contains the average IEI values, generated by randomly simulating 300 x 1 hits for 10 bins per quadrant (4 quadrants in total) using the Python programming language, and calculating the corresponding IEI values. This resulted in a total of 4 x 10 x 300 x 1 = 12,000 data points. The summary of the IEI values by quadrant and bin is provided in the file results_1_300.csv. The calculation of IEI values for averages is based on likelihood, using an absolute difference-based approach for the likelihood probability computation. IEI_Likelihood_Based_Data.zip

The weighted IEI average values for likelihood calculation (Bayes formula) are provided in the file Weighted_IEI_Average_08_01_2025.xlsx

This dataset contains the results of a simulated target search experiment using Bayesian posterior updates and Imprecision Entropy Indicators (IEI). Each row represents a hit during the search process, including metrics such as Shannon entropy (H), Gini index (G), average distance, angular deviation, and calculated IEI values. The dataset also includes bin-specific posterior probability updates and likelihood calculations for each iteration. The simulation explores adaptive learning and posterior penalization strategies to optimize the search efficiency. Our Bayesian adaptive searching system source code (search algorithm, 1000 target searches): IEI_Self_Learning_08_01_2025.pyThis dataset contains the results of 1,000 iterations of a successful target search simulation. The simulation runs until the target is successfully located for each iteration. The dataset includes further three main outputs: a) Results files (results{iteration_number}.csv): Details of each hit during the search process, including entropy measures, Gini index, average distance and angle, Imprecision Entropy Indicators (IEI), coordinates, and the bin number of the hit. b) Posterior updates (Pbin_all_steps_{iter_number}.csv): Tracks the posterior probability updates for all bins during the search process acrosations multiple steps. c) Likelihoodanalysis(likelihood_analysis_{iteration_number}.csv): Contains the calculated likelihood values for each bin at every step, based on the difference between the measured IEI and pre-defined IE bin averages. IEI_Self_Learning_08_01_2025.py

Based on the mentioned Python source code (see point 3, Bayesian adaptive searching method with IEI values), we performed 1,000 successful target searches, and the outputs were saved in the:Self_learning_model_test_output.zip file.

Bayesian Search (IEI) from different quadrant. This dataset contains the results of Bayesian adaptive target search simulations, including various outputs that represent the performance and analysis of the search algorithm. The dataset includes: a) Heatmaps (Heatmap_I_Quadrant, Heatmap_II_Quadrant, Heatmap_III_Quadrant, Heatmap_IV_Quadrant): These heatmaps represent the search results and the paths taken from each quadrant during the simulations. They indicate how frequently the system selected each bin during the search process. b) Posterior Distributions (All_posteriors, Probability_distribution_posteriors_values, CDF_posteriors_values): Generated based on posterior values, these files track the posterior probability updates, including cumulative distribution functions (CDF) and probability distributions. c) Macro Summary (summary_csv_macro): This file aggregates metrics and key statistics from the simulation. It summarizes the results from the individual results.csv files. d) Heatmap Searching Method Documentation (Bayesian_Heatmap_Searching_Method_05_12_2024): This document visualizes the search algorithm's path, showing how frequently each bin was selected during the 1,000 successful target searches. e) One-Way ANOVA Analysis (Anova_analyze_dataset, One_way_Anova_analysis_results): This includes the database and SPSS calculations used to examine whether the starting quadrant influences the number of search steps required. The analysis was conducted at a 5% significance level, followed by a Games-Howell post hoc test [43] to identify which target-surrounding quadrants differed significantly in terms of the number of search steps. Results were saved in the Self_learning_model_test_results.zip

This dataset contains randomly generated sequences of bin selections (1-40) from a control search algorithm (random search) used to benchmark the performance of Bayesian-based methods. The process iteratively generates random numbers until a stopping condition is met (reaching target bins 1, 11, 21, or 31). This dataset serves as a baseline for analyzing the efficiency, randomness, and convergence of non-adaptive search strategies. The dataset includes the following: a) The Python source code of the random search algorithm. b) A file (summary_random_search.csv) containing the results of 1000 successful target hits. c) A heatmap visualizing the frequency of search steps for each bin, providing insight into the distribution of steps across the bins. Random_search.zip

This dataset contains the results of a random walk search algorithm, designed as a control mechanism to benchmark adaptive search strategies (Bayesian-based methods). The random walk operates within a defined space of 40 bins, where each bin has a set of neighboring bins. The search begins from a randomly chosen starting bin and proceeds iteratively, moving to a randomly selected neighboring bin, until one of the stopping conditions is met (bins 1, 11, 21, or 31). The dataset provides detailed records of 1,000 random walk iterations, with the following key components: a) Individual Iteration Results: Each iteration's search path is saved in a separate CSV file (random_walk_results_.csv), listing the sequence of steps taken and the corresponding bin at each step. b) Summary File: A combined summary of all iterations is available in random_walk_results_summary.csv, which aggregates the step-by-step data for all 1,000 random walks. c) Heatmap Visualization: A heatmap file is included to illustrate the frequency distribution of steps across bins, highlighting the relative visit frequencies of each bin during the random walks. d) Python Source Code: The Python script used to generate the random walk dataset is provided, allowing reproducibility and customization for further experiments. Random_walk.zip

This dataset contains the results of a genetic search algorithm implemented as a control method to benchmark adaptive Bayesian-based search strategies. The algorithm operates in a 40-bin search space with predefined target bins (1, 11, 21, 31) and evolves solutions through random initialization, selection, crossover, and mutation over 1000 successful runs. Dataset Components: a) Run Results: Individual run data is stored in separate files (genetic_algorithm_run_.csv), detailing: Generation: The generation number. Fitness: The fitness score of the solution. Steps: The path length in bins. Solution: The sequence of bins visited. b) Summary File: summary.csv consolidates the best solutions from all runs, including their fitness scores, path lengths, and sequences. c) All Steps File: summary_all_steps.csv records all bins visited during the runs for distribution analysis. d) A heatmap was also generated for the genetic search algorithm, illustrating the frequency of bins chosen during the search process as a representation of the search pathways.Genetic_search_algorithm.zip

Technical Information

The dataset files have been compressed into a standard ZIP archive using Total Commander (version 9.50). The ZIP format ensures compatibility across various operating systems and tools.

The XLSX files were created using Microsoft Excel Standard 2019 (Version 1808, Build 10416.20027)

The Python program was developed using Visual Studio Code (Version 1.96.2, user setup), with the following environment details: Commit fabd6a6b30b49f79a7aba0f2ad9df9b399473380f, built on 2024-12-19. The Electron version is 32.6, and the runtime environment includes Chromium 128.0.6263.186, Node.js 20.18.1, and V8 12.8.374.38-electron.0. The operating system is Windows NT x64 10.0.19045.

The statistical analysis included in this dataset was partially conducted using IBM SPSS Statistics, Version 29.0.1.0

The CSV files in this dataset were created following European standards, using a semicolon (;) as the delimiter instead of a comma, encoded in UTF-8 to ensure compatibility with a wide
o
Reddit Bitcoin Comments Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Reddit Bitcoin Comments Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/afb22b14-6266-47ec-be7f-c936582d61ab
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset offers a window into user perspectives on one of the world's most popular cryptocurrencies, Bitcoin. It contains user comments from the Bitcoin Subreddit, spanning from early 2020 until now, providing insights into user conversations, topics discussed, and sentiments expressed within this vibrant online community. It is a valuable resource for breaking down comments based on time, replies, and score to gain unique insights, follow trends over time, and identify primary hot topics that excite the Bitcoin subreddit.

Columns

title: The title of the comment. (String)

score: The amount of upvotes received by the comment. (Integer)

url: The link to the individual Reddit page where a user can view all replies/responses associated with their initial post/comment. (String)

comms_num: The number of replies made regarding a particular initial post/comment. (Integer)

created: Date & time when the comment was initially posted. (DateTime)

body: Main content text provided in each individual post/comment. (String)

timestamp: Time stamp converted into a local US zone setting. (DateTime)

Distribution

The data file is typically in CSV format. It contains comments from the Bitcoin Subreddit. While a single total row count is not specified, examples of data distribution include score ranges from -9.00 to 4304.00, with 1,852 records in the -9.00 to 422.30 range. Timestamp data is provided in specific bins, for example, from 1670500766.00 to 1670593199.10 containing 81 records, and daily counts such as 979 records for 12/18/2022 - 12/19/2022.

Usage

This dataset is ideal for various applications, including: * Conducting sentiment analysis of Bitcoin Subreddit comments to examine the public's perception of cryptocurrency. * Identifying and visualising correlations between Reddit comments and changes in the value of Bitcoin cryptocurrency markets over time. * Identifying user trends in topic preferences for Bitcoin discussions on Reddit by analysing the body content, topics discussed, and URL associated with each comment. A working understanding of statistical concepts such as descriptive statistics, central tendency, and distributions, as well as basic SQL queries, is helpful for utilising this data effectively.

Coverage

The dataset covers user comments from the Bitcoin Subreddit. Its time range spans from early 2020 until now. Geographic scope is global, reflecting the nature of Reddit. Specific examples of data availability are shown for daily periods in December 2022.

License

CC0

Who Can Use It

This dataset is valuable for: * Data Scientists and Analysts: To gain unique insights into user conversations, topics, and sentiments in the Bitcoin community. * Researchers: For studying cryptocurrency market dynamics, public perception, and online community behaviour. * Developers: To build applications that track or analyse cryptocurrency discussions.

Dataset Name Suggestions

Reddit Bitcoin Comments Dataset

Bitcoin Subreddit Activity Log

Cryptocurrency Discussion Data

Bitcoin User Perspectives

Attributes

Original Data Source: Reddit: /r/Bitcoin
N
South Range, MI Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). South Range, MI Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b254570a-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of South Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of South Range across both sexes and to determine which sex constitutes the majority.

Key observations

There is a slight majority of male population, with 52.64% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the South Range is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of South Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Race & Ethnicity. You can refer the same here
c
Coronary heart disease (in persons of all ages): England
data.catchmentbasedapproach.org
hub.arcgis.com
Updated Apr 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Rivers Trust (2021). Coronary heart disease (in persons of all ages): England [Dataset]. https://data.catchmentbasedapproach.org/items/832de0122e4b4bba9ff69cadc1bf53c4
Explore at:
Dataset updated
Apr 7, 2021
Dataset authored and provided by
The Rivers Trust
Area covered

Description
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of coronary heart disease (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to coronary heart disease (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with coronary heart disease was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with coronary heart disease was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with coronary heart disease, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have coronary heart diseaseB) the NUMBER of people within that MSOA who are estimated to have coronary heart diseaseAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have coronary heart disease, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from coronary heart disease, and where those people make up a large percentage of the population, indicating there is a real issue with coronary heart disease within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of coronary heart disease, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of coronary heart disease.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
W
Construction status report 2016 for Quarter 4
cloud.csiss.gmu.edu
dtechtive.com
+4more
api, csv
Updated Mar 9, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://usmart.io/#/org/dhplg (2017). Construction status report 2016 for Quarter 4 [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/construction-status-report-2016-for-quarter-4
Explore at:
csv, apiAvailable download formats
Dataset updated
Mar 9, 2017
Dataset provided by
https://usmart.io/#/org/dhplg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Local Authorities, Approved Housing Bodies, the Housing Agency and the D/HPC&LG have been advancing a suite of social housing schemes, through a range of delivery mechanisms and programmes. This dataset provides a comprehensive list of these social housing schemes that are currently advancing nationwide and in some cases those that have been delivered in 2016. The dataset shows the status of the schemes at of the end of Q4 2016. This dataset of schemes includes details of all construction projects, broken down by programme and local authority, which will deliver new and much needed additional social housing stock to alleviate the current shortage of social housing. This delivery will include construction by Local Authorities and Approved Housing Bodies, Regeneration Projects, Capital Assistance Schemes, as well as Rapid Delivery projects, and projects funded under the Capital Advance and Leasing Facility. It is envisaged that the dataset will be updated on a quarterly basis in conjunction with the publication of the Quarterly Progress Reports on implementation of Rebuilding Ireland – an Action Plan for Housing and Homelessness.
NBA Lottery Picks from 1995 - 2020
kaggle.com
Updated Nov 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skanda Sastry (2020). NBA Lottery Picks from 1995 - 2020 [Dataset]. https://www.kaggle.com/skandasastry/nba-lottery-picks-from-1995-2020/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 27, 2020
Dataset provided by
Kaggle
Authors
Skanda Sastry
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Introduction

I've been really interested in plotting and visualizing different NBA trends throughout this Thanksgiving break. Recently, I have been wanting to fact-check a common axiom I hear around the NBA during draft season: the notion that *older* draft prospects tend to have have *lower* upside. This is such a widespread belief that it can be heard on all levels, from NBA fan discussion on r/nba, to media draft analysis, to even GMs speaking about their draft choices.

For this visualization, I calculated the age of every lottery pick in the NBA draft from 1995 - 2015. I started at 1995 since this was the first modern "prep-to-pro" year with Kevin Garnett jumping from high school to the NBA. I ended at 2015 since I don't think we can develop an accurate read on the career trajectory of draft picks chosen after 2015 yet.

For each age range, I plotted a boxplot to visualize the distribution of the players' career PER, WS/48, BPM, and VORP. Let me know if you prefer to see another stat included here - I just went with the ones that Basketball Reference had publicly available.

Data

Here is the link to my plot

Key Results and Conclusions

Minimal differences among 18-21 year old prospects

It seems that differences in "upside" among 18-21 year old prospects are largely contrived by our brain's intuition, since there do not appear to be any significant difference in performance or success in the NBA for 18-19 year olds when compared to 19-20 and 20-21 year olds. Although VORP shows that the best of the best players since 1995 have been those drafted at age 18-19, the variation in distribution of BPM, WS/48, and career PER data is much lower.

Thus, we should be a lot more careful when assigning more favorable grades to extremely young prospects because they don't seem to have markedly better careers when compared to their slightly older counterparts. (Example: The data shows that 20.8 year old Donovan Mitchell would not have any different upside than 18.9 year old Kevin Knox)

Lower Extreme values for 22+ year old prospects

Interestingly, it looks like the median production is not really affected by the age of the prospect selected at all. However, there are some clear differences in the extremes.

The collective distribution of 22 and 23 year old lottery prospects shows that they tend to have much lower upper quartiles and extreme values, thus the best-case scenarios for these types of players is not as exciting. Although this difference is not as pronounced for 18-21 year olds, there is a huge drop off in the upper extreme values when moving from the 21-22 year old range to the 22-23 range.

Contrary to many other contexts, the NBA draft is a lot more about the outliers than it is about the median selection - each team is gambling on their pick becoming a future Tim Duncan or Dirk Nowitzki, and a successful draft would mean finding a franchise player-level talent. Therefore, our final conclusion is that although there are minimal differences in upside when comparing prospects in the 18-21 age range, 22+ year old prospects tend to have markedly lower ceilings than their younger peers.

Acknowledgements/Notes

Data was scraped from basketball reference (player pages, draft pages, advanced stats pages) as well as wikipedia (specific dates of each draft for age calculation). Scraping was done using beautiful soup.

Figures were processed using numpy/pandas and visualized in matplotlib.

Sample sizes for each age range:

Age Range Sample Size
18 and under 2
18 - 19 24
19 - 20 70
20 - 21 75
21 - 22 66
22 - 23 44
23 + 13
S
A dataset of lakes with the area above 10 km2 in northwest China (2000 –...
scidb.cn
Updated Jun 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
张大弘; 李晓锋; 姚晓军 (2018). A dataset of lakes with the area above 10 km2 in northwest China (2000 – 2014) [Dataset]. http://doi.org/10.11922/sciencedb.621
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.621
Dataset updated
Jun 22, 2018
Dataset provided by
Science Data Bank
Authors
张大弘; 李晓锋; 姚晓军
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Northwestern China
Description
Abstract: Northwest China is deeply inland, and has a dry climate. The variation of lake area can reflect the temporal and spatial distribution characteristics of regional water resources to a certain extent. This data set is based on the comprehensive analysis of meteorological data and the actual coverage of Landsat series satellite images to determine the interpretation time. Referring to the lake data set of "Lake data set of 1:250000 above 1km2 in China（2005-2006）", the 113 lakes were selected as vectorization objects, which are in natural conditions, larger than 10 km2 and non-dry salt lakes. The lake boundary vector data of 15 periods from 2000 to 2014 were extracted by artificial visual interpretation. According to the principle of artificial visual interpretation of the lake area determined by the “Investigation on water quality, water quantity and biological resources of lakes in China” of the special of basic work of science and technology in the Ministry of science and technology, the accuracy of interpretation is controlled within one pixel.The data set includes three parts: (1) boundary vector data of northwest China, and (2) lake boundary vector data from 2000 to 2014, and (3) lake location and area statistics vector file. The data set basically reflects the change in lake’s boundary in northwest China from 2000 to 2014, which can be used as basic data for research on temporal and spatial changes of lakes in the region, climate change, and manual intervention in regional water resources.

Age Range	Sample Size
18 and under	2
18 - 19	24
19 - 20	70
20 - 21	75
21 - 22	66
22 - 23	44
23 +	13

Facebook

Twitter

Click to copy link

Link copied

Cite

Neilsberg Research (2025). Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235d521-f25d-11ef-8c1b-3860777c1fe6/

Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition

Explore at:

json, csvAvailable download formats

Dataset updated

Feb 24, 2025

Dataset authored and provided by

Neilsberg Research

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Montana, Grass Range

Variables measured

Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population

Measurement technique

The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.

Dataset funded by

Neilsberg Research

Description

About this dataset

Context

The dataset tabulates the population of Grass Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Grass Range across both sexes and to determine which sex constitutes the majority.

Key observations

There is a considerable majority of female population, with 71.13% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)
Population: The population of the gender in the Grass Range is shown in this column.
% of Total Population: This column displays the percentage distribution of each gender as a proportion of Grass Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range Population by Race & Ethnicity. You can refer the same here

Clear search

Close search

Google apps

Main menu

Grass Range, MT Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Simulation Data Set

NCHS - Percent Distribution of Births for Females by Age Group: United...

Dataset for: Quantifying how diagnostic test accuracy depends on threshold...

Data from: A 24-hour dynamic population distribution dataset based on mobile...

The ARLs of the proposed chart for , and n = 30.

Distribution of Physicians and Allied Health Practitioners by Gross Payment...

Distribution of Individual Registrants Covered by Local Geographic Area...

Population Distribution 1976

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

The computed sample size, simulated half-width and errors for the 95%...

Game Feedback Discord Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Harmonized Tree Species Occurrence Points for Europe

Data from: A New Bayesian Approach to Increase Measurement Accuracy Using a...

Reddit Bitcoin Comments Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

South Range, MI Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Coronary heart disease (in persons of all ages): England

Construction status report 2016 for Quarter 4

NBA Lottery Picks from 1995 - 2020

Introduction

Data

Key Results and Conclusions

Minimal differences among 18-21 year old prospects

Lower Extreme values for 22+ year old prospects

Acknowledgements/Notes

A dataset of lakes with the area above 10 km2 in northwest China (2000 –...

Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 EditionSee More Versions

About this dataset

Content

Inspiration

Recommended for further research

Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition