41 datasets found
  1. Summary descriptive statistics of TIMSS dataset.

    • plos.figshare.com
    xls
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig (2024). Summary descriptive statistics of TIMSS dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0297033.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Regression ranks among the most popular statistical analysis methods across many research areas, including psychology. Typically, regression coefficients are displayed in tables. While this mode of presentation is information-dense, extensive tables can be cumbersome to read and difficult to interpret. Here, we introduce three novel visualizations for reporting regression results. Our methods allow researchers to arrange large numbers of regression models in a single plot. Using regression results from real-world as well as simulated data, we demonstrate the transformations which are necessary to produce the required data structure and how to subsequently plot the results. The proposed methods provide visually appealing ways to report regression results efficiently and intuitively. Potential applications range from visual screening in the model selection stage to formal reporting in research papers. The procedure is fully reproducible using the provided code and can be executed via free-of-charge, open-source software routines in R.

  2. N

    Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Grass Range, MT Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/8de3d033-c989-11ee-9145-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Grass Range, Montana
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Grass Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Grass Range. The dataset can be utilized to understand the population distribution of Grass Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Grass Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Grass Range.

    Key observations

    Largest age group (population): Male # 35-39 years (9) | Female # 70-74 years (30). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Grass Range is shown in the following column.
    • Population (Female): The female population in the Grass Range is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Grass Range for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Grass Range Population by Gender. You can refer the same here

  3. Pre and Post-Exercise Heart Rate Analysis

    • kaggle.com
    zip
    Updated Sep 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah M Almutairi (2024). Pre and Post-Exercise Heart Rate Analysis [Dataset]. https://www.kaggle.com/datasets/abdullahmalmutairi/pre-and-post-exercise-heart-rate-analysis
    Explore at:
    zip(3857 bytes)Available download formats
    Dataset updated
    Sep 29, 2024
    Authors
    Abdullah M Almutairi
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Overview:

    This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.

    The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.

    File: Filename: heart_rate_data.csv File Format: CSV

    - Features (Columns):

    Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.

    Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.

    Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.

    Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.

    Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.

    Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.

    Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.

    Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.

    Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.

    Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.

    License: Choose an appropriate open license, such as:

    CC BY 4.0 (Attribution 4.0 International).

    Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?

    Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.

  4. e

    Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6

    • data.europa.eu
    • ckan.publishing.service.gov.uk
    • +2more
    html, sparql
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2021). Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6 [Dataset]. https://data.europa.eu/data/datasets/subjective-wellbeing-worthwhile-percentage-of-responses-in-range-0-6
    Explore at:
    html, sparqlAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    Ministry of Housing, Communities and Local Government
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.

    The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

    • Overall, how satisfied are you with your life nowadays?
    • Overall, to what extent do you feel the things you do in your life are worthwhile?
    • Overall, how happy did you feel yesterday?
    • Overall, how anxious did you feel yesterday?

    This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

    Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

    The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.

    This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

    The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

    At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

    The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

    The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

    The original data is available from the ONS website.

    Detailed information on the APS and the Subjective Wellbeing dataset is available here.

    As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.

  5. Credit Card Eligibility Data: Determining Factors

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors
    Explore at:
    zip(303227 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Rohit Sharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of the Credit Card Eligibility Data: Determining Factors

    The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

    Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

    VariableDescription
    IDAn identifier for each individual (customer).
    GenderThe gender of the individual.
    Own_carA binary feature indicating whether the individual owns a car.
    Own_propertyA binary feature indicating whether the individual owns a property.
    Work_phoneA binary feature indicating whether the individual has a work phone.
    PhoneA binary feature indicating whether the individual has a phone.
    EmailA binary feature indicating whether the individual has provided an email address.
    UnemployedA binary feature indicating whether the individual is unemployed.
    Num_childrenThe number of children the individual has.
    Num_familyThe total number of family members.
    Account_lengthThe length of the individual's account with a bank or financial institution.
    Total_incomeThe total income of the individual.
    AgeThe age of the individual.
    Years_employedThe number of years the individual has been employed.
    Income_typeThe type of income (e.g., employed, self-employed, etc.).
    Education_typeThe education level of the individual.
    Family_statusThe family status of the individual.
    Housing_typeThe type of housing the individual lives in.
    Occupation_typeThe type of occupation the individual is engaged in.
    TargetThe target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

    Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

    This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

  6. News Events Data in Asia ( Techsalerator)

    • datarade.ai
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). News Events Data in Asia ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-asia-techsalerator-techsalerator
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Techsalerator LLC
    Authors
    Techsalerator
    Area covered
    United Arab Emirates, Timor-Leste, Brunei Darussalam, Kazakhstan, Uzbekistan, Maldives, China, Hong Kong, Kyrgyzstan, Iran (Islamic Republic of)
    Description

    Techsalerator’s News Event Data in Asia offers a detailed and expansive dataset designed to provide businesses, analysts, journalists, and researchers with comprehensive insights into significant news events across the Asian continent. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable perspectives on regional developments, economic shifts, political changes, and cultural occurrences.

    Key Features of the Dataset: Extensive Coverage:

    The dataset aggregates news events from a wide range of sources such as company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse array of information from multiple reporting channels. Categorization of Events:

    News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

    The dataset is updated regularly to include the most current events, ensuring users have access to the latest news and can stay informed about recent developments as they happen. Geographic Segmentation:

    Events are tagged with their respective countries and regions within Asia. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

    Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

    The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into the evolution of news events. Advanced Search and Filter Options:

    Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Asian Countries and Territories Covered: Central Asia: Kazakhstan Kyrgyzstan Tajikistan Turkmenistan Uzbekistan East Asia: China Hong Kong (Special Administrative Region of China) Japan Mongolia North Korea South Korea Taiwan South Asia: Afghanistan Bangladesh Bhutan India Maldives Nepal Pakistan Sri Lanka Southeast Asia: Brunei Cambodia East Timor (Timor-Leste) Indonesia Laos Malaysia Myanmar (Burma) Philippines Singapore Thailand Vietnam Western Asia (Middle East): Armenia Azerbaijan Bahrain Cyprus Georgia Iraq Israel Jordan Kuwait Lebanon Oman Palestine Qatar Saudi Arabia Syria Turkey (partly in Europe, but often included in Asia contextually) United Arab Emirates Yemen Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Asia, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Asian news and events. Techsalerator’s News Event Data in Asia is a crucial resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

  7. Steam Games User Statistics and Features

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Steam Games User Statistics and Features [Dataset]. https://www.kaggle.com/datasets/thedevastator/steam-games-user-statistics-and-features/versions/2
    Explore at:
    zip(12130042 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    The Devastator
    Description

    Steam Games User Statistics and Features

    Steam Games: User Statistics, Features, and Support Details

    By Craig Kelly [source]

    About this dataset

    Steam Game Feature and User Statistic Dataset

    This dataset, titled Steam Game Feature and User Statistics, offers researchers, game developers, and enthusiasts an opportunity to analyze detailed information on various features of games available on the Steam platform. In addition to this, it also records user statistics and provides support details for each game. These data points include the release date of a game, the number of developers involved in its creation, its genre classification among several others.

    More specifically, through this dataset one can glean broadly diverse yet interconnected segments of information about a game’s details including integral values such as the developer count or DLC availability or boolean values which could range from determining if a game is free or if it caters to multiple gaming platforms like Windows, Linux or Mac. It offers rich text data providing insights into specific categories like in-app purchases or VR support.

    Moreover with more specific fields available one can through this dataset determine individual genre elements associated with each listed title breaking down segments covering non-gaming activities all the way up to massively multiplayer adventures! The financial aspect of any title is not overlooked either providing pivotal tracking markers related to a games nominal price point (initial & final rates) along with variances based on region specifics.

    Further adding depth are support email specifics that are listed along with any special notices like external account requirements which paint precise details related to individual titles facilitating perfect clarity for users – be it potential purchasers looking for specific features before deciding on their purchase choice or detailed market research studies trying to gauge user behaviour based on these complex interactions.

    Leaning more towards technical analysis there is wealth of datasets mapping minimum & recommended PC specifications split clearly across categories outlining requirements needed for smooth running across variations including Windows , Linux & MAC operating systems presenting clear performance indicators crucial towards gauging probable market success given typical end-user system capabilities!

    Taking contextual relevance into consideration the provided data has been processed using UTF-8 unicode conversion ensuring clear language localizations enhanced by the incorporation of specialized Python standard library sequences that prevents loss in translation as can be witnessed from the highly defined string values.

    To ensure seamless interaction for further analytical processing, each field was sanitized and set up to handle all common data types. Text strings were cleaned up via several iterative steps to facilitate easy interpretation and removal of any unwanted characters or white spaces that would otherwise throw off automated text analysis algorithms.

    In sum, this Steam Game Features and User Statistics dataset is an extremely comprehensive source of information regarding games available on the Steam platform offering multi-faceted insights into gaming trends & opportunities for those savvy enough to interpret

    How to use the dataset

    Guide: How to Use the Steam Game User Statistics and Features Dataset

    Follow these steps:

    1) Understand The Data Types

    The dataset comprises Textual, Integer, Float and Boolean values. - Text has been processed for easy handling. Strings have been cleaned up with stripping leading/trailing whitespace and converting strings to lower case. - Integer & Float are numeric values; missing ones are represented as '0' & '0.0', respectively. - Boolean fields can only contain two literal values: True or False.

    2) Familiarize Yourself With The Fields

    The dataset offers a diverse range of field categories. To use it effectively:

    • Review: Overview columns clarify details such as game name (QueryName, ResponseName), ReleaseDate, RequiredAge, etc.
    • Examine developer features: Columns such as developer count (DeveloperCount), publisher count (PublisherCount) package count (PackageCount) provide information related to game production demographics.
    • Look for user-focused information: User statistics such as owner count(SteamSpyOwners,SteamSpyOwnersVariance), player estimates(SteamSpyPlayersEstimate,SteamSpyPlayersVariance), recommendat...
  8. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    German Cancer Research Center
    Max Delbrück Center for Molecular Medicine
    Howard Hughes Medical Institute - Janelia Research Campus
    Max Delbrück Center
    Authors
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  9. d

    Data from: Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 -...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 - 7—Data [Dataset]. https://catalog.data.gov/dataset/variable-terrestrial-gps-telemetry-detection-rates-parts-1-7data
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    Studies utilizing Global Positioning System (GPS) telemetry rarely result in 100% fix success rates (FSR). Many assessments of wildlife resource use do not account for missing data, either assuming data loss is random or because a lack of practical treatment for systematic data loss. Several studies have explored how the environment, technological features, and animal behavior influence rates of missing data in GPS telemetry, but previous spatially explicit models developed to correct for sampling bias have been specified to small study areas, on a small range of data loss, or to be species-specific, limiting their general utility. Here we explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use. We also evaluate patterns in missing data that relate to potential animal activities that change the orientation of the antennae and characterize home-range probability of GPS detection for 4 focal species; cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Part 1, Positive Openness Raster (raster dataset): Openness is an angular measure of the relationship between surface relief and horizontal distance. For angles less than 90 degrees it is equivalent to the internal angle of a cone with its apex at a DEM location, and is constrained by neighboring elevations within a specified radial distance. 480 meter search radius was used for this calculation of positive openness. Openness incorporates the terrain line-of-sight or viewshed concept and is calculated from multiple zenith and nadir angles-here along eight azimuths. Positive openness measures openness above the surface, with high values for convex forms and low values for concave forms (Yokoyama et al. 2002). We calculated positive openness using a custom python script, following the methods of Yokoyama et. al (2002) using a USGS National Elevation Dataset as input. Part 2, Northern Arizona GPS Test Collar (csv): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. The model training data are provided here for fix attempts by hour. This table can be linked with the site location shapefile using the site field. Part 3, Probability Raster (raster dataset): Bias correction in GPS telemetry datasets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix aquistion. We found terrain exposure and tall overstory vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The models predictive ability was evaluated using two independent datasets from stationary test collars of different make/model, fix interval programing, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. We evaluated GPS telemetry datasets by comparing the mean probability of a successful GPS fix across study animals home-ranges, to the actual observed FSR of GPS downloaded deployed collars on cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Comparing the mean probability of acquisition within study animals home-ranges and observed FSRs of GPS downloaded collars resulted in a approximatly 1:1 linear relationship with an r-sq= 0.68. Part 4, GPS Test Collar Sites (shapefile): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. Part 5, Cougar Home Ranges (shapefile): Cougar home-ranges were calculated to compare the mean probability of a GPS fix acquisition across the home-range to the actual fix success rate (FSR) of the collar as a means for evaluating if characteristics of an animal’s home-range have an effect on observed FSR. We estimated home-ranges using the Local Convex Hull (LoCoH) method using the 90th isopleth. Data obtained from GPS download of retrieved units were only used. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose as additional 10% of data. Comparisons with home-range mean probability of fix were also used as a reference for assessing if the frequency animals use areas of low GPS acquisition rates may play a role in observed FSRs. Part 6, Cougar Fix Success Rate by Hour (csv): Cougar GPS collar fix success varied by hour-of-day suggesting circadian rhythms with bouts of rest during daylight hours may change the orientation of the GPS receiver affecting the ability to acquire fixes. Raw data of overall fix success rates (FSR) and FSR by hour were used to predict relative reductions in FSR. Data only includes direct GPS download datasets. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose approximately an additional 10% of data. Part 7, Openness Python Script version 2.0: This python script was used to calculate positive openness using a 30 meter digital elevation model for a large geographic area in Arizona, California, Nevada and Utah. A scientific research project used the script to explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use.

  10. d

    Police Transparency - Arrests - Last 90 Days (Dataset)

    • catalog.data.gov
    • performance.tempe.gov
    • +9more
    Updated Oct 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Police Transparency - Arrests - Last 90 Days (Dataset) [Dataset]. https://catalog.data.gov/dataset/police-transparency-arrests-last-90-days-dataset
    Explore at:
    Dataset updated
    Oct 25, 2025
    Dataset provided by
    City of Tempe
    Description

    Main Table / Denormalized Version (last 90 days only)This dataset provides demographic information related to arrests made by the Tempe Police Department. Each record represents an individual charge associated with an arrest and includes details about both the person arrested (arrestee) and the arresting officer. Demographic fields include race and ethnicity, age range at the time of arrest, and gender for each party.The data is sourced from the Police Department’s Records Management System (RMS) and supports analysis of patterns related to arrests, enforcement activity, and demographic trends over time. This information is a component of ongoing efforts to promote transparency and provide context for law enforcement within the community.For detailed guidance on interpreting arrest counts and demographic breakdowns, please refer to the User Guide: Understanding the Arrests Demographic Datasets.Why this Dataset is Organized this Way?The main arrests open data table includes key information from each arrest event, along with associated person and charge details in one place. This format is ideal for quick viewing and simple analysis.Providing this format supports a wide range of users, from casual data explorers to experienced analysts.Understanding the Arrests Open Data (main table / denormalized version)Each row in this dataset represents a single charge, which means a single arrest event may appear multiple times if multiple charges were filed. To determine the number of unique arrests, users should perform a distinct count of the rin field, which serves as the arrest incident identifier.Likewise:To count unique arrestees, use a distinct count of the pin field (person identifier).To count unique arresting officers, use a distinct count of the arrest_officer field. This structure enables users to explore charge-level detail while maintaining the ability to summarize demographic data by arrest event, arrestee, or officer as needed. Visit the User Guide: Understanding the Arrests Demographic Datasets for more details.Data DictionaryAdditional InformationContact Email: PD_DataRequest@tempe.govContact Phone: N/ALink: N/AData Source: Versaterm RMSData Source Type: SQL ServerPreparation Method: Automated processPublish Frequency: DailyPublish Method: Automatic

  11. N

    South Range, MI Population Breakdown by Gender

    • neilsberg.com
    csv, json
    Updated Sep 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). South Range, MI Population Breakdown by Gender [Dataset]. https://www.neilsberg.com/research/datasets/658fcb29-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of South Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of South Range across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 50.54% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the South Range is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of South Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here

  12. Variable definitions.

    • plos.figshare.com
    xls
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi (2025). Variable definitions. [Dataset]. http://doi.org/10.1371/journal.pone.0324125.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The aim of this study is to evaluate the level of satisfaction with health examination services in Türkiye. It is thought that the findings will contribute to the more effective management of the health service process and offer potential solutions to identified problems. Notably, a significant portion of the problems encountered in healthcare services tends to arise during the examination phase. Therefore, this research was conducted to address these problems by thoroughly analyzing public satisfaction, with the expectation that such an approach could provide actionable insights for resolving these problems. In the study, the micro data set of the 2023 Life Satisfaction Survey conducted by the Turkish Statistical Institute was used. The analysis process was carried out with a two-stage method. In the first stage, Pearson’s χ² test was used to evaluate whether the independent variables had a statistically significant relationship with satisfaction with health examination services. In the second stage, a considering the binary categorical structure of the dependent variable, a logit regression model was applied to estimate the relationship between satisfaction with health examination services and the independent variables. The findings revealed that 61.85% of Turkish citizens were satisfied with health examination services. Furthermore, this level of satisfaction was significantly affected by a wide range of sociodemographic, individual, and institution-related factors. The study’s findings suggest that aligning individuals’ demands in the health service process with guidance from field experts and developing targeted policies could lead to improved satisfaction with health examination services. In addition, it is foreseen that the concept of trust is important in the satisfaction that constitutes the main subject of the study in health services and in the negative situations experienced in different subjects. Based on these insights, initiatives can be taken to increase trust in the health system through health policies to be designed. Furthermore, the results highlight the growing importance of digitalization and digital hospitals in healthcare. Further progress in this direction will increase the satisfaction with health examination and contribute to positive results in health services.

  13. ECMWF ERA5: ensemble means of surface level analysis parameter data

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (ECMWF) (2025). ECMWF ERA5: ensemble means of surface level analysis parameter data [Dataset]. https://catalogue.ceda.ac.uk/uuid/d8021685264e43c7a0868396a5f582d0
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    European Centre for Medium-Range Weather Forecasts (ECMWF)
    License

    https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf

    Area covered
    Earth
    Variables measured
    cloud_area_fraction, sea_ice_area_fraction, air_pressure_at_mean_sea_level, lwe_thickness_of_atmosphere_mass_content_of_water_vapor
    Description

    This dataset contains ERA5 surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.

    Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data.

    The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.

    An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.

  14. f

    Data from: Improving geological logging of drill holes using geochemical...

    • tandf.figshare.com
    pdf
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E. J. Hill; A. Fabris; Y. Uvarova; C. Tiddy (2024). Improving geological logging of drill holes using geochemical data and data analytics for mineral exploration in the Gawler Ranges, South Australia [Dataset]. http://doi.org/10.6084/m9.figshare.16699519.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    E. J. Hill; A. Fabris; Y. Uvarova; C. Tiddy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia, South Australia, Gawler Ranges
    Description

    Geochemical data are frequently collected from mineral exploration drill-hole samples to more accurately define and characterise the geological units intersected by the drill hole. However, large multi-element data sets are slow and challenging to interpret without using some form of automated analysis, such as mathematical, statistical or machine learning techniques. Automated analysis techniques also have the advantage in that they are repeatable and can provide consistent results, even for very large data sets. In this paper, an automated litho-geochemical interpretation workflow is demonstrated, which includes data exploration and data preparation using appropriate compositional data-analysis techniques. Multiscale analysis using a modified wavelet tessellation has been applied to the data to provide coherent geological domains. Unsupervised machine learning (clustering) has been used to provide a first-pass classification. The results are compared with the detailed geologist’s logs. The comparison shows how the integration of automated analysis of geochemical data can be used to enhance traditional geological logging and demonstrates the identification of new geological units from the automated litho-geochemical logging that were not apparent from visual logging but are geochemically distinct. To reduce computational complexity and facilitate interpretation, a subset of geochemical elements is selected, and then a centred log-ratio transform is applied. The wavelet tessellation method is used to domain the drill holes into rock units at a range of scales. Several clustering methods were tested to identify distinct rock units in the samples and multiscale domains for classification. Results are compared with geologist’s logs to assess how geochemical data analysis can inform and improve traditional geology logs.

  15. Compounds for Studying Environmental Exposures

    • kaggle.com
    zip
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Compounds for Studying Environmental Exposures [Dataset]. https://www.kaggle.com/datasets/thedevastator/pubchemlite-compound-collection-for-exposomics-3
    Explore at:
    zip(46715153 bytes)Available download formats
    Dataset updated
    Jan 4, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Compounds for Studying Environmental Exposures

    PubChemLite: Annotation Categories for Translational and Applied Research

    By [source]

    About this dataset

    The PubChemLite Compound Collection for Exposomics is a comprehensive compilation of over 371,000 chemicals from a diverse range of areas and application domains. This invaluable library provides data on molecular structure and composition, annotation categories, chemical functionality, as well as useful information about associated disorders and diseases. It encompasses fields ranging from tumorology to drug-discovery, nutrition to toxicology - all enriched with PubMed papers and patents related to each substance. Moreover, the collection includes safety information regarding the pharmacological effects of each compound as well its toxicity profile when exposed in vitro or when metabolised by the liver. For food-related substances the FoodRelated field provides further details on whether their use is suitable for Human Consumption or not. With its comprehensive range of annotation categories this collection can provide invaluable insight into how environment affects human health giving researchers access to serious evidence backed source data helping them pursue important questions in exposomics

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides an invaluable resource for research in a range of fields, including tumorology, drug-discovery, food and nutrition, toxicology, and many others. It can be used to explore the relationships between various chemicals and related biological effects.

    In order to use the PubChemLite Compound Collection for Exposomics effectively and efficiently there are several key steps to follow:

    • Familiarize yourself with the columns in the dataset. There are 15 columns available in this dataset which provide information on a range of topics as well as relevant annotation types related to each chemical compound. By understanding which columns are most relevant you can better focus your investigations into specific areas of interest.

    • Analyze each column according to its type. Each column contains data elements that can have different formats or data types (e.g., integer values for PubMed_Counts). Make sure you understand how these datatypes impact how you interpret or apply your analysis techniques on the data set. Additionally check whether any appropriate filtering is necessary according to certain criteria before further investigating individual rows .

    • Use tools such as visualization tools for visualizing patterns within specific variables or relationships between them if needed . Plotting techniques such as box scheme libraries (like seaborn ) may be used here where suitable .

    Research Ideas

    • Developing a personalized nutrition plan by correlating individual food intake to the associated chemical compounds for better understanding of nutrient absorption and health effects.
    • Understanding reproducibility in drug-discovery and drug safety with detailed analysis of PubMed, Patent and Toxicity information linked to each compound in the dataset.
    • Identifying new opportunities for agrochemical research and product development through visibility into AgroChemInfo annotation data linked to key compounds found in the dataset

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: PubChemLite_31Oct2020_exposomics.csv | Column name | Description | |:---------------------|:---------------------------------------------------------------------------------------------------| | FirstBlock | A unique identifier for each chemical compound. (String) | | PubMed_Count | The number of times the chemical compound has been mentioned in PubMed. (Integer) | | Patent_Count | The number of times the chemical compound has been mentioned in patents. (Integer) | | Synonym | A list of alternative names for the chemical ...

  16. Product Retail Prices per month from 2017-2025

    • kaggle.com
    zip
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aradhana Hirapara (2025). Product Retail Prices per month from 2017-2025 [Dataset]. https://www.kaggle.com/datasets/aradhanahirapara/product-retail-price-survey-2017-2025
    Explore at:
    zip(2543973 bytes)Available download formats
    Dataset updated
    Apr 13, 2025
    Authors
    Aradhana Hirapara
    Description

    This dataset contains monthly retail price data for a wide range of consumer products sold in various Canadian provinces over several years. It has been enriched with tax, category, and classification metadata for deeper insights.

    Usefulness of the Dataset

    This dataset can be used for:

    Use CaseDescription
    Price Trend AnalysisTrack price movements over time, province, and product category.
    Inflation StudiesExamine inflation on essentials vs non-essentials over time.
    Regional Price ComparisonAnalyze cost disparities for the same goods across provinces.
    Tax Policy ImpactUnderstand how tax laws affect consumer pricing by region.
    Budget OptimizationIdentify high-cost vs low-cost essentials for better planning.
    Machine Learning IntegrationUse in models for price prediction or consumer segmentation.

    Purpose and Use Cases

    This dataset is ideal for:

    🏛️ Policy Analysis

    Understand how federal and provincial taxes shape price access — especially for essentials like milk, bread, or medications.

    🧍‍♀️ Consumer Insights

    See how costs for personal care, food, and baby goods evolve month-over-month in each region.

    💸 Inflation & Seasonality

    Analyze how monthly or yearly trends (e.g., holiday spikes or inflation events) affect product pricing.

    🌍 Social Impact Studies

    Measure product accessibility gaps between provinces for low-income consumers or high-tax regions.

    🛍️ Retail & Budget Planning

    Guide families, retailers, or policymakers on where and when to buy or subsidize certain products.

  17. a

    Arrests

    • hub.arcgis.com
    • open.tempe.gov
    • +7more
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Arrests [Dataset]. https://hub.arcgis.com/datasets/tempegov::police-transparency-arrests-all-data-related-tables-normalized?layer=0
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Related Tables / Normalized VersionThis dataset provides demographic information related to arrests made by the Tempe Police Department. Demographic fields include race and ethnicity, age range at the time of arrest, and gender for each party. The data is sourced from the Police Department’s Records Management System (RMS) and supports analysis of patterns related to arrests, enforcement activity, and demographic trends over time. This information is a component of ongoing efforts to promote transparency and provide context for law enforcement within the community.For detailed guidance on interpreting arrest counts and demographic breakdowns, please refer to the User Guide: Understanding the Arrest Demographic Datasets - Related Tables.Why this Dataset is Organized this Way?The related tables such as persons, charges, and locations follow a normalized data model. This structure is often preferred by data professionals for more advanced analysis, filtering, or joining with external datasets.Providing this format supports a wide range of users, from casual data explorers to experienced analysts.Understanding the Arrests Data (as related tables)The related tables represent different parts of the arrest data. Each one focuses on a different type of information, like the officers, individuals arrested, charges, and arrest details.All of these tables connect back to the arrests table, which acts as the central record for each event. This structure is called a normalized model and is often used to manage data in a more efficient way. Visit the User Guide: Understanding the Arrest Demographic Datasets - Related Tables for more details outlining the relationships between the related tables.Data DictionaryAdditional InformationContact Email: PD_DataRequest@tempe.govContact Phone: N/ALink: N/AData Source: Versaterm RMSData Source Type: SQL ServerPreparation Method: Automated processPublish Frequency: DailyPublish Method: Automatic

  18. f

    Logit regression model and variance inflation factors.

    • plos.figshare.com
    xls
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi (2025). Logit regression model and variance inflation factors. [Dataset]. http://doi.org/10.1371/journal.pone.0324125.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Logit regression model and variance inflation factors.

  19. Findings on factors affecting individuals’ satisfaction with medical...

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi (2025). Findings on factors affecting individuals’ satisfaction with medical examination services (N: 9502). [Dataset]. http://doi.org/10.1371/journal.pone.0324125.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kübranur Çebi Karaaslan; Hülya Diğer; Tubanur Çebi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Findings on factors affecting individuals’ satisfaction with medical examination services (N: 9502).

  20. Mountain Lion Predicted Habitat - CWHR M165 [ds2616]

    • data.cnra.ca.gov
    • data.ca.gov
    • +4more
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2023). Mountain Lion Predicted Habitat - CWHR M165 [ds2616] [Dataset]. https://data.cnra.ca.gov/dataset/mountain-lion-predicted-habitat-cwhr-m165-ds2616
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets used in the creation of the predicted Habitat Suitability models includes the CWHR range maps of Californias regularly-occurring vertebrates which were digitized as GIS layers to support the predictions of the CWHR System software. These vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.The models also used the CALFIRE-FRAP compiled "best available" land cover data known as Fveg. This compilation dataset was created as a single data layer, to support the various analyses required for the Forest and Rangeland Assessment, a legislatively mandated function. These data are being updated to support on-going analyses and to prepare for the next FRAP assessment in 2015. An accurate depiction of the spatial distribution of habitat types within California is required for a variety of legislatively-mandated government functions. The California Department of Forestry and Fire Protections CALFIRE Fire and Resource Assessment Program (FRAP), in cooperation with California Department of Fish and Wildlife VegCamp program and extensive use of USDA Forest Service Region 5 Remote Sensing Laboratory (RSL) data, has compiled the "best available" land cover data available for California into a single comprehensive statewide data set. The data span a period from approximately 1990 to 2014. Typically the most current, detailed and consistent data were collected for various regions of the state. Decision rules were developed that controlled which layers were given priority in areas of overlap. Cross-walks were used to compile the various sources into the common classification scheme, the California Wildlife Habitat Relationships (CWHR) system.CWHR range data was used together with the FVEG vegetation maps and CWHR habitat suitability ranks to create Predicted Habitat Suitability maps for species. The Predicted Habitat Suitability maps show the mean habitat suitability score for the species, as defined in CWHR. CWHR defines habitat suitability as NO SUITABILITY (0), LOW (0.33), MEDIUM (0.66), or HIGH (1) for reproduction, cover, and feeding for each species in each habitat stage (habitat type, size, and density combination). The mean is the average of the reproduction, cover, and feeding scores, and can be interpreted as LOW (less than 0.34), MEDIUM (0.34-0.66), and HIGH (greater than 0.66) suitability. Note that habitat suitability ranks were developed based on habitat patch sizes >40 acres in size, and are best interpreted for habitat patches >200 acres in size. The CWHR Predicted Habitat Suitability rasters are named according to the 4 digit alpha-numeric species CWHR ID code. The CWHR Species Lookup Table contains a record for each species including its CWHR ID, scientific name, common name, and range map revision history (available for download at https://www.wildlife.ca.gov/Data/CWHR).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig (2024). Summary descriptive statistics of TIMSS dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0297033.t001
Organization logo

Summary descriptive statistics of TIMSS dataset.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Feb 2, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Regression ranks among the most popular statistical analysis methods across many research areas, including psychology. Typically, regression coefficients are displayed in tables. While this mode of presentation is information-dense, extensive tables can be cumbersome to read and difficult to interpret. Here, we introduce three novel visualizations for reporting regression results. Our methods allow researchers to arrange large numbers of regression models in a single plot. Using regression results from real-world as well as simulated data, we demonstrate the transformations which are necessary to produce the required data structure and how to subsequently plot the results. The proposed methods provide visually appealing ways to report regression results efficiently and intuitively. Potential applications range from visual screening in the model selection stage to formal reporting in research papers. The procedure is fully reproducible using the provided code and can be executed via free-of-charge, open-source software routines in R.

Search
Clear search
Close search
Google apps
Main menu