100+ datasets found
  1. Retail Product Dataset with Missing Values

    • kaggle.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
    Explore at:
    zip(47826 bytes)Available download formats
    Dataset updated
    Feb 17, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

    The dataset includes:
    - Category (Categorical): Product category (A, B, C, D)
    - Price (Numerical): Randomized product prices
    - Rating (Numerical): Ratings between 1 to 5
    - Stock (Categorical): Availability status (In Stock, Out of Stock)
    - Discount (Numerical): Discount percentage

    This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

  2. H

    Replication data for: A Unified Approach To Measurement Error And Missing...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Blackwell; James Honaker; Gary King (2016). Replication data for: A Unified Approach To Measurement Error And Missing Data: Overview [Dataset]. http://doi.org/10.7910/DVN/29606
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Blackwell; James Honaker; Gary King
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606

    Description

    Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b). Notes: This is the first of two articles to appear in the same issue of the same journal by the same authors. The second is “A Unified Approach to Measurement Error and Missing Data: Details and Extensions.” See also: Missing Data

  3. n

    Data from: A hierarchical Bayesian approach for handling missing...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Mar 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alison C. Ketz; Therese L. Johnson; Mevin B. Hooten; M. Thompson Hobbs (2019). A hierarchical Bayesian approach for handling missing classification data [Dataset]. http://doi.org/10.5061/dryad.8h36t01
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 22, 2019
    Dataset provided by
    National Park Service
    Colorado State University
    Authors
    Alison C. Ketz; Therese L. Johnson; Mevin B. Hooten; M. Thompson Hobbs
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Southwest US
    Description

    Ecologists use classifications of individuals in categories to understand composition of populations and communities. These categories might be defined by demographics, functional traits, or species. Assignment of categories is often imperfect, but frequently treated as observations without error. When individuals are observed but not classified, these “partial” observations must be modified to include the missing data mechanism to avoid spurious inference.

    We developed two hierarchical Bayesian models to overcome the assumption of perfect assignment to mutually exclusive categories in the multinomial distribution of categorical counts, when classifications are missing. These models incorporate auxiliary information to adjust the posterior distributions of the proportions of membership in categories. In one model, we use an empirical Bayes approach, where a subset of data from one year serves as a prior for the missing data the next. In the other approach, we use a small random sample of data within a year to inform the distribution of the missing data.

    We performed a simulation to show the bias that occurs when partial observations were ignored and demonstrated the altered inference for the estimation of demographic ratios. We applied our models to demographic classifications of elk (Cervus elaphus nelsoni) to demonstrate improved inference for the proportions of sex and stage classes.

    We developed multiple modeling approaches using a generalizable nested multinomial structure to account for partially observed data that were missing not at random for classification counts. Accounting for classification uncertainty is important to accurately understand the composition of populations and communities in ecological studies.

  4. PPT4J - Data

    • zenodo.org
    bin, txt
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiyuan Pan; Zhiyuan Pan (2023). PPT4J - Data [Dataset]. http://doi.org/10.5281/zenodo.10397354
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhiyuan Pan; Zhiyuan Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Download all files to get the dataset. Please check the MD5 checksums after download.

    1. Run cat ppt4j_data.tar.xz.part* > ppt4j_data.tar.xz to get the complete archive.
    2. Run awk '{print $2 " " $1}' MD5.txt > MD5.chk && md5sum --ignore-missing --check MD5.chk to check the integrity of downloaded files. The format of MD5.txt is not compatible with md5sum, so the awk command is employed to fix this. Sorry for the inconvenience.
    3. Extract ppt4j_data.tar.xz, then follow the instructions at https://github.com/pan2013e/ppt4j. The tarball file is created in macOS with bsdtar, and you may notice warnings like tar: Ignoring unknown extended header keyword 'XXX' if you extract it in Linux with gnutar. You can just ignore these warnings, as long as the checksum is okay.
  5. f

    Identifying Heat Waves in Florida: Considerations of Missing Weather Data

    • figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Leary; Linda J. Young; Chris DuClos; Melissa M. Jordan (2023). Identifying Heat Waves in Florida: Considerations of Missing Weather Data [Dataset]. http://doi.org/10.1371/journal.pone.0143471
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Emily Leary; Linda J. Young; Chris DuClos; Melissa M. Jordan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Florida
    Description

    BackgroundUsing current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information.ObjectivesTo identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida.MethodsIn addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida.ResultsModeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September).ConclusionsMissing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised.

  6. Comprehensive EDA of Residential Features

    • kaggle.com
    zip
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coding expert G.N (2025). Comprehensive EDA of Residential Features [Dataset]. https://www.kaggle.com/datasets/ranaghulamnabi/comprehensive-eda-of-residential-features
    Explore at:
    zip(4762 bytes)Available download formats
    Dataset updated
    Nov 23, 2025
    Authors
    Coding expert G.N
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    The dataset contains 545 entries (rows) and 13 features (columns). It is a clean dataset with no missing values across all columns, meaning you can skip the standard null-value imputation step. The dataset consists of 7 numerical columns and 6 categorical columns (including the target price): Given that the data is clean (no missing values), the best next step is to start your Exploratory Data Analysis (EDA).

    Feature Distribution:

  7. f

    Summary statistics of 581 patients’ number of observations

    • plos.figshare.com
    xls
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ji Soo Kim; Ami A. Shah; Laura K. Hummers; Scott L. Zeger (2025). Summary statistics of 581 patients’ number of observations [Dataset]. http://doi.org/10.1371/journal.pone.0320414.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ji Soo Kim; Ami A. Shah; Laura K. Hummers; Scott L. Zeger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary statistics of 581 patients’ number of observations

  8. e

    ComBat HarmonizR enables the integrated analysis of independently generated...

    • ebi.ac.uk
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Voß (2022). ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD027467
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Hannah Voß
    Variables measured
    Proteomics
    Description

    The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss

  9. Men's Year-End ATP Rankings

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Men's Year-End ATP Rankings [Dataset]. https://www.kaggle.com/datasets/thedevastator/men-s-year-end-atp-rankings-1972-2016
    Explore at:
    zip(643896 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    Description

    Men's Year-End ATP Rankings

    A Global Perspective

    By Granger Huntress [source]

    About this dataset

    This dataset provides a comprehensive look at the world of men's professional tennis throughout the Open Era. Every year, a new crop of tennis players has emerged to challenge long-standing traditions, while others have continued to maintain their place near the top. Through this dataset you will uncover which players succeeded in reaching or maintaining their ranking positions in the record books and how they navigated through changing eras in men’s professional tennis. Dive deep into what makes these successful athletes stand out from the rest and make impacts on their game year after year with an overview of invaluable data provided by this collection from first name, birthdate, country of origin, handedness, date range for records kept and more importantly their ATP career end rankings. Whether you are interested in a snapshot view to analyze long term trends or want to get inside insights on why top players succeed—this analysis provides invaluable resources to explore men's ATP rankings throughout its Open Era journey

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a comprehensive source of men's year-end rankings during the Open Era. Each record includes information on the player's ranking, name, birthdate, country of origin, and handedness. This dataset can be used to study the trends in professional tennis throughout this time period and analyze how they have changed over time.

    To use this dataset effectively one should first explore the data by reviewing some basic statistics. Examples include summary statistics such as total players by country or average ranking across years. Summarizing the data will help get a quick understanding of what the data is composed of and any existing patterns that may be present in it.

    Another important step you can take before analyzing this data deeper would be to check for missing values or outliers within it that could affect your results if ignored or not handled appropriately. Having an understanding about any potential issues with your data like these can save you from potentially misinterpreting results due to an incomplete analysis process at some later point in time after further exploration and analysis has been done with it.

    Once an overview of your dataset has been established and potential issues have been addressed it is now time to start conducting a more detailed exploration into what insights our data holds us answer questions related to Professional Tennis during this time period such as: How did various nations perform over different years? Who was consistently ranked among the top 10 players throughout this period? Any trends we see associated with handedness preference? etc… Answering questions like these properly requires finding appropriate ways analyze them given our available set up variables so keep that in mind when trying pin down connections between our variables using techniques like correlations, linear regression etc… In addition, visualizations can also help you make sense out of large amounts complex multivariable relationships which may exist between varying sets up parameters all at once so don't forget including those whenever possible! This way you'e able maximize accuracy when uncovering hidden intricacies regarding both individual components and holistic summary statistics for tennis rankings all over years covered within this open era range

    Research Ideas

    • Analyzing the global trends in men's tennis in the Open Era over time by examining shifts in countries represented at each year-end ranking.
    • Examining the effectiveness of different opponents based on the nature of their hands (right or left) when compared to men's hand-edness throughout the Open Era.
    • Tracking and predicting future player rankings based on birthdates, country, and other relevant factors that influence performance

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: ltdPlayerMaster.csv | Column name | Description | |:--------------|:---------------------------------------------------| | FIRST | First name of the player. (String) | | LAST | Last name of the player. (String) | | HAND | Handedness of the player (Right or Left). ...

  10. Virtual Sensors: Efficiently Estimating Missing Spectra - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Virtual Sensors: Efficiently Estimating Missing Spectra - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/virtual-sensors-efficiently-estimating-missing-spectra
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Various instruments are used to create images of the Earth and other objects in the universe in a diverse set of wavelength bands with the aim of understanding natural phenomena. Sometimes these instruments are built in a phased approach, with additional measurement capabilities added in later phases. In other cases, technology may mature to the point that the instrument offers new measurement capabilities that were not planned in the original design of the instrument. In still other cases, high resolution spectral measurements may be too costly to perform on a large sample and therefore lower resolution spectral instruments are used to take the majority of measurements. Many applied science questions that are relevant to the earth science remote sensing community require analysis of enormous amounts of data that were generated by instruments with disparate measurement capabilities. This paper addresses this problem using Virtual Sensors: a method that uses modelstrained on spectrally rich (high spectral resolution) data to "fill in" unmeasured spectral channels in spectrally poor (low spectral resolution) data. The models we use in this paper are Multi-Layer Perceptrons (MLPs), Support Vector Machines (SVMs) with Radial Basis Function (RBF) kernels and SVMs with Mixture Density Mercer Kernels (MDMK). We demonstrate this method by using models trained on the high spectral resolution Terra MODIS instrument to estimate what the equivalent of the MODIS 1.6 micron channel would be for the NOAA AVHRR/2 instrument. The scientific motivation for the simulation of the 1.6 micron channel is to improve the ability of the AVHRR/2 sensor to detect clouds over snow and ice.

  11. Bandit algorithms defined by allocation probability πk,t or index value...

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xijin Chen; Kim May Lee; Sofia S. Villar; David S. Robertson (2023). Bandit algorithms defined by allocation probability πk,t or index value Ik,t. [Dataset]. http://doi.org/10.1371/journal.pone.0274272.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xijin Chen; Kim May Lee; Sofia S. Villar; David S. Robertson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bandit algorithms defined by allocation probability πk,t or index value Ik,t.

  12. n

    Data from: Rewilded mammal assemblages reveal the missing ecological...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte H. Mills; Christopher E. Gordon; Mike Letnic (2018). Rewilded mammal assemblages reveal the missing ecological functions of granivores [Dataset]. http://doi.org/10.5061/dryad.c565c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2018
    Dataset provided by
    University of Wollongong
    UNSW Sydney
    Authors
    Charlotte H. Mills; Christopher E. Gordon; Mike Letnic
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Central Australia, Scotia, Roxby Downs
    Description
    1. Rewilding is a strategy for ecological restoration that uses reintroductions of animals to re-establish the ecological functions of keystone species. Globally, rewilding efforts have focused primarily on reinstating the ecological functions of charismatic megafauna. In Australia, rewilding efforts have focused on restoring the ecological functions of herbivorous and omnivorous rodents and marsupials weighing between 30-5000g inside of predator-proof exclosures.
    2. In many arid ecosystems, mammals are considered the dominant seed predators. In Australian deserts, ants are considered to be the primary removers and predators of seeds and mammals unimportant removers and predators of seeds. However, most research on granivory in Australian deserts has occurred in areas where native mammals were functionally extinct.
    3. Here, we compare rates of seed removal by mammals and ants on shrub seeds and abundance of shrub seedlings in two rewilded desert ecosystems (Arid Recovery Reserve and Scotia Wildlife Sanctuary) with adjacent areas possessing depauperate mammal faunas. We used foraging trays containing seeds of common native shrubs (Acacia ligulata and Dodonaea viscosa) to examine rates of seed removal by ants and mammals. We quantified the abundance of A. ligulata and D. viscosa seedlings inside and outside of rewilded areas along belt transects.
    4. By excluding ants and mammals from foraging trays, we show that ants removed more seeds than mammals where mammal assemblages were depauperate, but mammals removed far more seeds than ants in rewilded areas. Shrub seedlings were more abundant in areas with depauperate mammal faunas than in rewilded areas.
    5. Our study provides evidence that rewilding of desert mammal assemblages has restored the hitherto unappreciated ecological function of omnivorous rodents and bettongs as seed predators. We hypothesize that the loss of omnivorous mammals may be a factor that has facilitated shrub encroachment in arid Australia.
    6. We contend that rewilding programs aimed at restoring ecological processes should not ignore consumers with relatively lower per capita consumptive effects. This is because consumers with low per capita consumptive effects often occur at high population densities or perform critical ecological functions and thus may have significant population level impacts that can be harnessed for ecological restoration.
  13. n

    Data from: PHYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology

    • data-staging.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated May 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Søren Faurby; Matt Davis; Rasmus Østergaard Pedersen; Simon D. Schowanek; Alexandre Antonelli; Jens-Christian Svenning (2019). PHYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology [Dataset]. http://doi.org/10.5061/dryad.bp26v20
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 11, 2019
    Dataset provided by
    Aarhus University
    University of Gothenburg
    Authors
    Søren Faurby; Matt Davis; Rasmus Østergaard Pedersen; Simon D. Schowanek; Alexandre Antonelli; Jens-Christian Svenning
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Africa, Australia, South America, Europe, North America, Oceania, Global, Asia
    Description

    Data needed for macroecological analyses are difficult to compile and often hidden away in supplementary material under non-standardized formats. Phylogenies, range data, and trait data often use conflicting taxonomies and require ad hoc decisions to synonymize species or fill in large amounts of missing data. Furthermore, most available data sets ignore the large impact that humans have had on species ranges and diversity. Ignoring these impacts can lead to drastic differences in diversity patterns and estimates of the strength of biological rules. To help overcome these issues, we assembled PHYLACINE, The Phylogenetic Atlas of Mammal Macroecology. This taxonomically integrated platform contains phylogenies, range maps, trait data, and threat status for all 5,831 known mammal species that lived since the last interglacial (~130,000 years ago until present). PHYLACINE is ready to use directly, as all taxonomy and metadata are consistent across the different types of data, and files are provided in easy-to-use formats. The atlas includes both maps of current species ranges and present natural ranges, which represent estimates of where species would live without anthropogenic pressures. Trait data include body mass and coarse measures of life habit and diet. Data gaps have been minimized through extensive literature searches and clearly labelled imputation of missing values. The PHYLACINE database will be archived here as well as hosted online so that users may easily contribute updates and corrections to continually improve the data. This database will be useful to any researcher who wishes to investigate large scale ecological patterns. Previous versions of the database has already provided valuable information and have for instance shown that megafauna extinctions caused substantial changes in vegetation structure and nutrient transfer patterns across the globe. All data and metadata provided here represent PHYLACINE Version 1.2.0.

  14. Pattern of Human Concerns Data, 1957-1963 - Archival Version

    • search.gesis.org
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cantril, Hadley (2001). Pattern of Human Concerns Data, 1957-1963 - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR07023
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    GESIS search
    Authors
    Cantril, Hadley
    License

    https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441083https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441083

    Description

    Abstract (en): Of the 14 nations included in the original study, these data cover the following ten: Brazil, Cuba, Dominican Republic, India, Israel, Nigeria, Panama, United States, West Germany, and Yugoslavia. (The data for Egypt, Japan, the Philippines, and Poland are not available through ICPSR.) In India and Israel the interviews were conducted in two waves, with different samples. Besides ascertaining the usual personal information, the study employed a "Self-Anchoring Striving Scale," an open-ended scale asking the respondent to define hopes and fears for self and the nation, to determine the two extremes of a self-defined spectrum on each of several variables. After these subjective ratings were obtained, the respondents indicated their perceptions of where they and their nations stood on a hypothetical ladder at three different points in time. Demographic variables include the respondents' age, gender, marital status, and level of education. For more information on the samples, coding, and the means of measurement, see the related publication listed below. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Adult population of Brazil, Cuba, Dominican Republic, India, Israel, Nigeria, Panama, United States, West Germany and Yugoslavia. Separate samples were drawn in each country. All samples were intended to be crossnational, except for the kibbutz sample in Israel. However, both India samples underrepresent females, and the sample from Cuba was drawn exclusively from urban areas. In addition, the samples from Brazil, Cuba, the Dominican Republic, India, Nigeria, Panama, and the United States were weighted to achieve the intended representation. 2006-01-12 All files were removed from dataset 13 and flagged as study-level files, so that they will accompany all downloads. (1) Because the original data format included some multiply punched variables, it is inappropriate to assume that the first response of a multiple response variable is more important than the rest: the current order of responses is an artifact of the technology used to record and recover them. It is even possible to have a missing data code followed by further substantive responses in some cases. (2) These data files were originally released separately, under ICPSR study numbers 7023-7031, 7085-7086, and 7258. They are now concatenated into one data collection as 7023. References in the codebooks to the old study numbers should be ignored. (3) The codebooks are also available together in one bound volume available upon request from ICPSR. 4) The codebook is provided by ICPSR as a Portable Document Format (PDF) file. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as Adobe Acrobat Reader. Information on how to obtain a copy of the Acrobat Reader is provided on the ICPSR Web site.

  15. c

    Philadelphia Properties and Assessment History

    • s.cnmilf.com
    • catalog.data.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Philadelphia (2025). Philadelphia Properties and Assessment History [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/philadelphia-properties-and-assessment-history
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    City of Philadelphia
    Area covered
    Philadelphia
    Description

    Some of the information in the open data files below may not yet reflect the data used to calculate the most recent tax year’s property value. If you see missing or incorrect info about your property, use this form to contact OPA to report the issue. Property characteristic and assessment history from the Office of Property Assessment for all properties in Philadelphia. See more information on how OPA assesses property and their reports on the quality of assessments. This data updates nightly. Please ignore the 'created by' date below - the date of August 2015 shows when this webpage, not the data, was created.

  16. d

    Data from: Exact Bayesian inference for animal movement in continuous time

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Aug 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul G. Blackwell; Mu Niu; Mark S. Lambert; Scott D. LaPoint (2016). Exact Bayesian inference for animal movement in continuous time [Dataset]. http://doi.org/10.5061/dryad.mv02k
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 5, 2016
    Dataset provided by
    Dryad
    Authors
    Paul G. Blackwell; Mu Niu; Mark S. Lambert; Scott D. LaPoint
    Time period covered
    Aug 5, 2015
    Area covered
    United Kingdom
    Description

    GPS locations for an adult female wild boarSequence of locations for an adult female wild boar fitted with a GPS collar. Values are times in minutes and co-ordinates in metres (from an arbitrary origin). Data are extracted from the study described in Quy, R. J., Massei, G., Lambert, M. S., Coats, J., Miller, L. A., and Cowan, D. P. (2014) Effects of a GnRH vaccine on the movement and activity of free-living wild boar (Sus scrofa), Wildlife Research 41, 185-193.WildBoarMEE.txt

  17. e

    Data from: Quantifying changes in fish population stability using...

    • portal.edirepository.org
    csv
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Walter (2025). Quantifying changes in fish population stability using statistical early warnings of regime shifts [Dataset]. http://doi.org/10.6073/pasta/aa552e1f82f95a33c2ea657e3c0706e4
    Explore at:
    csv(81006 byte)Available download formats
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    EDI
    Authors
    Jonathan Walter
    Time period covered
    1980 - 2023
    Area covered
    Variables measured
    n, slope, metric, spunit, p_value, species, t_value, slope_se
    Description

    This data package describes long-term trends in metrics describing population stability and used as statistical early warnings of regime shifts in 29 fish species that inhabit the San Francisco Bay-Delta in central California, USA. Metrics used in this study include spatial synchrony, temporal coefficient of variation (CV), and lag-1 temporal autocorrelation. Trends were measured using ordinary least squares linear regression.

       These derived data were developed from abundance (as CPUE) time series based on three long-term fish monitoring studies included in https://doi.org/10.6073/pasta/a29a6e674b0f8797e13fbc4b08b92e5b; the Fall Midwater Trawl Survey, Delta Juvenile Monitoring Program, and Bay Study. Selected data were from fall months (September to December) in 1980-2023, from midwater trawl and beach seine surveys for which sampling effort (e.g., tow volume) was recorded. Data on fish exceeding maximum length thresholds for age-0 fish were discarded, except for white sturgeon, where the maximum length threshold corresponded to approximately 10 years of age, the onset of reproductive maturity. Observations from different sampling stations were aggregated into 10 sub-regions (South San Francisco Bay, Central San Francisco Bay, San Pablo Bay, Napa River, Suisun Bay, Delta Confluence, South Delta, North Delta, San Joaquin River, Sacramento River, and midwater trawl samples and beach seine samples were considered separately because the methods sample distinct habitat types. Combinations of sub-region and sampling method were considered distinct spatial units.
    
       EWI metrics were measured in 5-year rolling windows to permit assessment of changes over time. The temporal CV and lag-1 autocorrelation were measured on individual spatial unit time series, ignoring windows with >1 year of missing data. The coefficient of variation divides the standard deviation by the mean. Lag-1 autocorrelation was measured as Pearson correlation. Spatial synchrony was measured across spatial units, ignoring spatial units with >1 year of missing data, and ignoring rolling windows where <3 spatial units had sufficient data. Spatial synchrony was measured as the mean of pairwise Spearman correlations. Trends in EWI metrics were measured only when there were at least 5 rolling window measurements spanning at least 10 years.
    
  18. True Influence - Proprietary B2B Intent Data Feed (USA)

    • datarade.ai
    .json, .xml
    Updated Jun 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    True Influence (2020). True Influence - Proprietary B2B Intent Data Feed (USA) [Dataset]. https://datarade.ai/data-products/true-influence-proprietary-intent-data-feed
    Explore at:
    .json, .xmlAvailable download formats
    Dataset updated
    Jun 18, 2020
    Dataset provided by
    Anteriad, LLC
    Authors
    True Influence
    Area covered
    United States of America
    Description

    Our proprietary intent data is more expansive than what is available from data co-ops or single-source providers, delivering a comprehensive base for your advanced intent analysis. We monitor intent behavior by both executive and managerial customer personas, to help you develop a complete picture of an organizations’ buying dynamics.

    Our exclusive Identity Graph technology goes beyond simple reverse IP lookup to identify small and midsize companies that do not have dedicated IP addresses. Our advanced triangulation technologies are based on dozens of variables and pinpoints accounts, locations, and specific individuals who are expressing intent. This critical intent intelligence is either missing or ignored in most other data streams.

    Our AI, machine learning, and natural language analysis of content identifies precise topical interest and maps intent activity to our taxonomy of more than 7,000 B2B topics. And we can easily add new topics based upon customer requirements.

    The True Influence Relevance Engine™ analyzes intent activity on more than just frequency. We include activity type, topical relevance, and historical trends to find patterns that make intent a strategic differentiator for your solution. Our intent data can take your data-driven sales and marketing solution or service to the next level.

  19. Bias Characterization in Probabilistic Genotype Data and Improved Signal...

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cameron Palmer; Itsik Pe’er (2023). Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation [Dataset]. http://doi.org/10.1371/journal.pgen.1006091
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Cameron Palmer; Itsik Pe’er
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.

  20. Data compilation of ciliates growth rate, grazing rate and gross gowth...

    • doi.pangaea.de
    • service.tib.eu
    • +1more
    html, tsv
    Updated Jan 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevrine Sailley; Christine Klaas (2014). Data compilation of ciliates growth rate, grazing rate and gross gowth efficiency from field and labratory experiments [Dataset]. http://doi.org/10.1594/PANGAEA.826106
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Jan 15, 2014
    Dataset provided by
    PANGAEA
    Authors
    Sevrine Sailley; Christine Klaas
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Oct 3, 1979
    Area covered
    Variables measured
    Taxon/taxa, Event label, Cell biovolume, Carbon per cell, Reference/source, Latitude of event, Longitude of event, Treatment: temperature, Gross growth efficiency, Ciliates, cell biovolume, and 7 more
    Description

    The present data compilation includes ciliates growth rate, grazing rate and gross growth efficiency determined either in the field or in laboratory experiments. From the existing literature, we synthesized all data that we could find on cilliate. Some sources might be missing but none were purposefully ignored. Field data on microzooplankton grazing are mostly comprised of grazing rate using the dilution technique with a 24h incubation period. Laboratory grazing and growth data are focused on pelagic ciliates and heterotrophic dinoflagellates. The experiment measured grazing or growth as a function of prey concentration or at saturating prey concentration (maximal grazing rate). When considering every single data point available (each measured rate for a defined predator-prey pair and a certain prey concentration) there is a total of 1485 data points for the ciliates, counting experiments that measured growth and grazing simultaneously as 1 data point.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
Organization logo

Retail Product Dataset with Missing Values

A dataset with numerical categorical values structured missing data for analysis

Explore at:
zip(47826 bytes)Available download formats
Dataset updated
Feb 17, 2025
Authors
Himel Sarder
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

Search
Clear search
Close search
Google apps
Main menu