100+ datasets found

Retail Product Dataset with Missing Values
kaggle.com
zip
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
Explore at:
zip(47826 bytes)Available download formats
Dataset updated
Feb 17, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
H
Replication data for: A Unified Approach To Measurement Error And Missing...
dataverse.harvard.edu
search.dataone.org
Updated Nov 17, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Blackwell; James Honaker; Gary King (2016). Replication data for: A Unified Approach To Measurement Error And Missing Data: Overview [Dataset]. http://doi.org/10.7910/DVN/29606
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/29606
Dataset updated
Nov 17, 2016
Dataset provided by
Harvard Dataverse
Authors
Matthew Blackwell; James Honaker; Gary King
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606
Description
Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b). Notes: This is the first of two articles to appear in the same issue of the same journal by the same authors. The second is “A Unified Approach to Measurement Error and Missing Data: Details and Extensions.” See also: Missing Data
n
Data from: A hierarchical Bayesian approach for handling missing...
data.niaid.nih.gov
datadryad.org
zip
Updated Mar 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alison C. Ketz; Therese L. Johnson; Mevin B. Hooten; M. Thompson Hobbs (2019). A hierarchical Bayesian approach for handling missing classification data [Dataset]. http://doi.org/10.5061/dryad.8h36t01
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8h36t01
Dataset updated
Mar 22, 2019
Dataset provided by
National Park Service
Colorado State University
Authors
Alison C. Ketz; Therese L. Johnson; Mevin B. Hooten; M. Thompson Hobbs
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Southwest US
Description
Ecologists use classifications of individuals in categories to understand composition of populations and communities. These categories might be defined by demographics, functional traits, or species. Assignment of categories is often imperfect, but frequently treated as observations without error. When individuals are observed but not classified, these “partial” observations must be modified to include the missing data mechanism to avoid spurious inference.

We developed two hierarchical Bayesian models to overcome the assumption of perfect assignment to mutually exclusive categories in the multinomial distribution of categorical counts, when classifications are missing. These models incorporate auxiliary information to adjust the posterior distributions of the proportions of membership in categories. In one model, we use an empirical Bayes approach, where a subset of data from one year serves as a prior for the missing data the next. In the other approach, we use a small random sample of data within a year to inform the distribution of the missing data.

We performed a simulation to show the bias that occurs when partial observations were ignored and demonstrated the altered inference for the estimation of demographic ratios. We applied our models to demographic classifications of elk (Cervus elaphus nelsoni) to demonstrate improved inference for the proportions of sex and stage classes.

We developed multiple modeling approaches using a generalizable nested multinomial structure to account for partially observed data that were missing not at random for classification counts. Accounting for classification uncertainty is important to accurately understand the composition of populations and communities in ecological studies.
PPT4J - Data
zenodo.org
bin, txt
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiyuan Pan; Zhiyuan Pan (2023). PPT4J - Data [Dataset]. http://doi.org/10.5281/zenodo.10397354
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10397354
Dataset updated
Dec 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhiyuan Pan; Zhiyuan Pan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Download all files to get the dataset. Please check the MD5 checksums after download.

Run cat ppt4j_data.tar.xz.part* > ppt4j_data.tar.xz to get the complete archive.

Run awk '{print $2 " " $1}' MD5.txt > MD5.chk && md5sum --ignore-missing --check MD5.chk to check the integrity of downloaded files. The format of MD5.txt is not compatible with md5sum, so the awk command is employed to fix this. Sorry for the inconvenience.

Extract ppt4j_data.tar.xz, then follow the instructions at https://github.com/pan2013e/ppt4j. The tarball file is created in macOS with bsdtar, and you may notice warnings like tar: Ignoring unknown extended header keyword 'XXX' if you extract it in Linux with gnutar. You can just ignore these warnings, as long as the checksum is okay.
f
Identifying Heat Waves in Florida: Considerations of Missing Weather Data
figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Leary; Linda J. Young; Chris DuClos; Melissa M. Jordan (2023). Identifying Heat Waves in Florida: Considerations of Missing Weather Data [Dataset]. http://doi.org/10.1371/journal.pone.0143471
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0143471
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Emily Leary; Linda J. Young; Chris DuClos; Melissa M. Jordan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Florida
Description
BackgroundUsing current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information.ObjectivesTo identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida.MethodsIn addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida.ResultsModeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September).ConclusionsMissing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised.
Comprehensive EDA of Residential Features
kaggle.com
zip
Updated Nov 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coding expert G.N (2025). Comprehensive EDA of Residential Features [Dataset]. https://www.kaggle.com/datasets/ranaghulamnabi/comprehensive-eda-of-residential-features
Explore at:
zip(4762 bytes)Available download formats
Dataset updated
Nov 23, 2025
Authors
Coding expert G.N
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context:

The dataset contains 545 entries (rows) and 13 features (columns). It is a clean dataset with no missing values across all columns, meaning you can skip the standard null-value imputation step. The dataset consists of 7 numerical columns and 6 categorical columns (including the target price): Given that the data is clean (no missing values), the best next step is to start your Exploratory Data Analysis (EDA).

Feature Distribution:
f
Summary statistics of 581 patients’ number of observations
plos.figshare.com
xls
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ji Soo Kim; Ami A. Shah; Laura K. Hummers; Scott L. Zeger (2025). Summary statistics of 581 patients’ number of observations [Dataset]. http://doi.org/10.1371/journal.pone.0320414.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320414.t001
Dataset updated
Apr 21, 2025
Dataset provided by
PLOS ONE
Authors
Ji Soo Kim; Ami A. Shah; Laura K. Hummers; Scott L. Zeger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary statistics of 581 patients’ number of observations
e
ComBat HarmonizR enables the integrated analysis of independently generated...
ebi.ac.uk
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Voß (2022). ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD027467
Explore at:
Dataset updated
May 23, 2022
Authors
Hannah Voß
Variables measured
Proteomics
Description
The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
Men's Year-End ATP Rankings
kaggle.com
zip
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Men's Year-End ATP Rankings [Dataset]. https://www.kaggle.com/datasets/thedevastator/men-s-year-end-atp-rankings-1972-2016
Explore at:
zip(643896 bytes)Available download formats
Dataset updated
Jan 15, 2023
Authors
The Devastator
Description
Men's Year-End ATP Rankings

A Global Perspective

By Granger Huntress [source]

About this dataset

This dataset provides a comprehensive look at the world of men's professional tennis throughout the Open Era. Every year, a new crop of tennis players has emerged to challenge long-standing traditions, while others have continued to maintain their place near the top. Through this dataset you will uncover which players succeeded in reaching or maintaining their ranking positions in the record books and how they navigated through changing eras in men’s professional tennis. Dive deep into what makes these successful athletes stand out from the rest and make impacts on their game year after year with an overview of invaluable data provided by this collection from first name, birthdate, country of origin, handedness, date range for records kept and more importantly their ATP career end rankings. Whether you are interested in a snapshot view to analyze long term trends or want to get inside insights on why top players succeed—this analysis provides invaluable resources to explore men's ATP rankings throughout its Open Era journey

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a comprehensive source of men's year-end rankings during the Open Era. Each record includes information on the player's ranking, name, birthdate, country of origin, and handedness. This dataset can be used to study the trends in professional tennis throughout this time period and analyze how they have changed over time.

To use this dataset effectively one should first explore the data by reviewing some basic statistics. Examples include summary statistics such as total players by country or average ranking across years. Summarizing the data will help get a quick understanding of what the data is composed of and any existing patterns that may be present in it.

Another important step you can take before analyzing this data deeper would be to check for missing values or outliers within it that could affect your results if ignored or not handled appropriately. Having an understanding about any potential issues with your data like these can save you from potentially misinterpreting results due to an incomplete analysis process at some later point in time after further exploration and analysis has been done with it.

Once an overview of your dataset has been established and potential issues have been addressed it is now time to start conducting a more detailed exploration into what insights our data holds us answer questions related to Professional Tennis during this time period such as: How did various nations perform over different years? Who was consistently ranked among the top 10 players throughout this period? Any trends we see associated with handedness preference? etc… Answering questions like these properly requires finding appropriate ways analyze them given our available set up variables so keep that in mind when trying pin down connections between our variables using techniques like correlations, linear regression etc… In addition, visualizations can also help you make sense out of large amounts complex multivariable relationships which may exist between varying sets up parameters all at once so don't forget including those whenever possible! This way you'e able maximize accuracy when uncovering hidden intricacies regarding both individual components and holistic summary statistics for tennis rankings all over years covered within this open era range

Research Ideas

Analyzing the global trends in men's tennis in the Open Era over time by examining shifts in countries represented at each year-end ranking.

Examining the effectiveness of different opponents based on the nature of their hands (right or left) when compared to men's hand-edness throughout the Open Era.

Tracking and predicting future player rankings based on birthdates, country, and other relevant factors that influence performance

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: ltdPlayerMaster.csv | Column name | Description | |:--------------|:---------------------------------------------------| | FIRST | First name of the player. (String) | | LAST | Last name of the player. (String) | | HAND | Handedness of the player (Right or Left). ...
Virtual Sensors: Efficiently Estimating Missing Spectra - Dataset - NASA...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Virtual Sensors: Efficiently Estimating Missing Spectra - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/virtual-sensors-efficiently-estimating-missing-spectra
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Various instruments are used to create images of the Earth and other objects in the universe in a diverse set of wavelength bands with the aim of understanding natural phenomena. Sometimes these instruments are built in a phased approach, with additional measurement capabilities added in later phases. In other cases, technology may mature to the point that the instrument offers new measurement capabilities that were not planned in the original design of the instrument. In still other cases, high resolution spectral measurements may be too costly to perform on a large sample and therefore lower resolution spectral instruments are used to take the majority of measurements. Many applied science questions that are relevant to the earth science remote sensing community require analysis of enormous amounts of data that were generated by instruments with disparate measurement capabilities. This paper addresses this problem using Virtual Sensors: a method that uses modelstrained on spectrally rich (high spectral resolution) data to "fill in" unmeasured spectral channels in spectrally poor (low spectral resolution) data. The models we use in this paper are Multi-Layer Perceptrons (MLPs), Support Vector Machines (SVMs) with Radial Basis Function (RBF) kernels and SVMs with Mixture Density Mercer Kernels (MDMK). We demonstrate this method by using models trained on the high spectral resolution Terra MODIS instrument to estimate what the equivalent of the MODIS 1.6 micron channel would be for the NOAA AVHRR/2 instrument. The scientific motivation for the simulation of the 1.6 micron channel is to improve the ability of the AVHRR/2 sensor to detect clouds over snow and ice.
Bandit algorithms defined by allocation probability πk,t or index value...
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xijin Chen; Kim May Lee; Sofia S. Villar; David S. Robertson (2023). Bandit algorithms defined by allocation probability πk,t or index value Ik,t. [Dataset]. http://doi.org/10.1371/journal.pone.0274272.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274272.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xijin Chen; Kim May Lee; Sofia S. Villar; David S. Robertson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bandit algorithms defined by allocation probability πk,t or index value Ik,t.
n
Data from: Rewilded mammal assemblages reveal the missing ecological...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
zip
Updated Jul 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charlotte H. Mills; Christopher E. Gordon; Mike Letnic (2018). Rewilded mammal assemblages reveal the missing ecological functions of granivores [Dataset]. http://doi.org/10.5061/dryad.c565c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.c565c
Dataset updated
Jul 25, 2018
Dataset provided by
University of Wollongong
UNSW Sydney
Authors
Charlotte H. Mills; Christopher E. Gordon; Mike Letnic
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Central Australia, Scotia, Roxby Downs
Description
Rewilding is a strategy for ecological restoration that uses reintroductions of animals to re-establish the ecological functions of keystone species. Globally, rewilding efforts have focused primarily on reinstating the ecological functions of charismatic megafauna. In Australia, rewilding efforts have focused on restoring the ecological functions of herbivorous and omnivorous rodents and marsupials weighing between 30-5000g inside of predator-proof exclosures.

In many arid ecosystems, mammals are considered the dominant seed predators. In Australian deserts, ants are considered to be the primary removers and predators of seeds and mammals unimportant removers and predators of seeds. However, most research on granivory in Australian deserts has occurred in areas where native mammals were functionally extinct.

Here, we compare rates of seed removal by mammals and ants on shrub seeds and abundance of shrub seedlings in two rewilded desert ecosystems (Arid Recovery Reserve and Scotia Wildlife Sanctuary) with adjacent areas possessing depauperate mammal faunas. We used foraging trays containing seeds of common native shrubs (Acacia ligulata and Dodonaea viscosa) to examine rates of seed removal by ants and mammals. We quantified the abundance of A. ligulata and D. viscosa seedlings inside and outside of rewilded areas along belt transects.

By excluding ants and mammals from foraging trays, we show that ants removed more seeds than mammals where mammal assemblages were depauperate, but mammals removed far more seeds than ants in rewilded areas. Shrub seedlings were more abundant in areas with depauperate mammal faunas than in rewilded areas.

Our study provides evidence that rewilding of desert mammal assemblages has restored the hitherto unappreciated ecological function of omnivorous rodents and bettongs as seed predators. We hypothesize that the loss of omnivorous mammals may be a factor that has facilitated shrub encroachment in arid Australia.

We contend that rewilding programs aimed at restoring ecological processes should not ignore consumers with relatively lower per capita consumptive effects. This is because consumers with low per capita consumptive effects often occur at high population densities or perform critical ecological functions and thus may have significant population level impacts that can be harnessed for ecological restoration.
n
Data from: PHYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology
data-staging.niaid.nih.gov
datadryad.org
+1more
zip
Updated May 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Søren Faurby; Matt Davis; Rasmus Østergaard Pedersen; Simon D. Schowanek; Alexandre Antonelli; Jens-Christian Svenning (2019). PHYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology [Dataset]. http://doi.org/10.5061/dryad.bp26v20
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bp26v20
Dataset updated
May 11, 2019
Dataset provided by
Aarhus University
University of Gothenburg
Authors
Søren Faurby; Matt Davis; Rasmus Østergaard Pedersen; Simon D. Schowanek; Alexandre Antonelli; Jens-Christian Svenning
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Africa, Australia, South America, Europe, North America, Oceania, Global, Asia
Description
Data needed for macroecological analyses are difficult to compile and often hidden away in supplementary material under non-standardized formats. Phylogenies, range data, and trait data often use conflicting taxonomies and require ad hoc decisions to synonymize species or fill in large amounts of missing data. Furthermore, most available data sets ignore the large impact that humans have had on species ranges and diversity. Ignoring these impacts can lead to drastic differences in diversity patterns and estimates of the strength of biological rules. To help overcome these issues, we assembled PHYLACINE, The Phylogenetic Atlas of Mammal Macroecology. This taxonomically integrated platform contains phylogenies, range maps, trait data, and threat status for all 5,831 known mammal species that lived since the last interglacial (~130,000 years ago until present). PHYLACINE is ready to use directly, as all taxonomy and metadata are consistent across the different types of data, and files are provided in easy-to-use formats. The atlas includes both maps of current species ranges and present natural ranges, which represent estimates of where species would live without anthropogenic pressures. Trait data include body mass and coarse measures of life habit and diet. Data gaps have been minimized through extensive literature searches and clearly labelled imputation of missing values. The PHYLACINE database will be archived here as well as hosted online so that users may easily contribute updates and corrections to continually improve the data. This database will be useful to any researcher who wishes to investigate large scale ecological patterns. Previous versions of the database has already provided valuable information and have for instance shown that megafauna extinctions caused substantial changes in vegetation structure and nutrient transfer patterns across the globe. All data and metadata provided here represent PHYLACINE Version 1.2.0.
Pattern of Human Concerns Data, 1957-1963 - Archival Version
search.gesis.org
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cantril, Hadley (2001). Pattern of Human Concerns Data, 1957-1963 - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR07023
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR07023
Dataset updated
Feb 1, 2001
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
GESIS search
Authors
Cantril, Hadley
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441083https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441083
Description
Abstract (en): Of the 14 nations included in the original study, these data cover the following ten: Brazil, Cuba, Dominican Republic, India, Israel, Nigeria, Panama, United States, West Germany, and Yugoslavia. (The data for Egypt, Japan, the Philippines, and Poland are not available through ICPSR.) In India and Israel the interviews were conducted in two waves, with different samples. Besides ascertaining the usual personal information, the study employed a "Self-Anchoring Striving Scale," an open-ended scale asking the respondent to define hopes and fears for self and the nation, to determine the two extremes of a self-defined spectrum on each of several variables. After these subjective ratings were obtained, the respondents indicated their perceptions of where they and their nations stood on a hypothetical ladder at three different points in time. Demographic variables include the respondents' age, gender, marital status, and level of education. For more information on the samples, coding, and the means of measurement, see the related publication listed below. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Adult population of Brazil, Cuba, Dominican Republic, India, Israel, Nigeria, Panama, United States, West Germany and Yugoslavia. Separate samples were drawn in each country. All samples were intended to be crossnational, except for the kibbutz sample in Israel. However, both India samples underrepresent females, and the sample from Cuba was drawn exclusively from urban areas. In addition, the samples from Brazil, Cuba, the Dominican Republic, India, Nigeria, Panama, and the United States were weighted to achieve the intended representation. 2006-01-12 All files were removed from dataset 13 and flagged as study-level files, so that they will accompany all downloads. (1) Because the original data format included some multiply punched variables, it is inappropriate to assume that the first response of a multiple response variable is more important than the rest: the current order of responses is an artifact of the technology used to record and recover them. It is even possible to have a missing data code followed by further substantive responses in some cases. (2) These data files were originally released separately, under ICPSR study numbers 7023-7031, 7085-7086, and 7258. They are now concatenated into one data collection as 7023. References in the codebooks to the old study numbers should be ignored. (3) The codebooks are also available together in one bound volume available upon request from ICPSR. 4) The codebook is provided by ICPSR as a Portable Document Format (PDF) file. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as Adobe Acrobat Reader. Information on how to obtain a copy of the Acrobat Reader is provided on the ICPSR Web site.
c
Philadelphia Properties and Assessment History
s.cnmilf.com
catalog.data.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Philadelphia (2025). Philadelphia Properties and Assessment History [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/philadelphia-properties-and-assessment-history
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
City of Philadelphia
Area covered
Philadelphia
Description
Some of the information in the open data files below may not yet reflect the data used to calculate the most recent tax year’s property value. If you see missing or incorrect info about your property, use this form to contact OPA to report the issue. Property characteristic and assessment history from the Office of Property Assessment for all properties in Philadelphia. See more information on how OPA assesses property and their reports on the quality of assessments. This data updates nightly. Please ignore the 'created by' date below - the date of August 2015 shows when this webpage, not the data, was created.
d
Data from: Exact Bayesian inference for animal movement in continuous time
datadryad.org
data.niaid.nih.gov
zip
Updated Aug 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul G. Blackwell; Mu Niu; Mark S. Lambert; Scott D. LaPoint (2016). Exact Bayesian inference for animal movement in continuous time [Dataset]. http://doi.org/10.5061/dryad.mv02k
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mv02k
Dataset updated
Aug 5, 2016
Dataset provided by
Dryad
Authors
Paul G. Blackwell; Mu Niu; Mark S. Lambert; Scott D. LaPoint
Time period covered
Aug 5, 2015
Area covered
United Kingdom
Description
GPS locations for an adult female wild boarSequence of locations for an adult female wild boar fitted with a GPS collar. Values are times in minutes and co-ordinates in metres (from an arbitrary origin). Data are extracted from the study described in Quy, R. J., Massei, G., Lambert, M. S., Coats, J., Miller, L. A., and Cowan, D. P. (2014) Effects of a GnRH vaccine on the movement and activity of free-living wild boar (Sus scrofa), Wildlife Research 41, 185-193.WildBoarMEE.txt

Data from: Quantifying changes in fish population stability using...

portal.edirepository.org

csv

Updated Feb 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan Walter (2025). Quantifying changes in fish population stability using statistical early warnings of regime shifts [Dataset]. http://doi.org/10.6073/pasta/aa552e1f82f95a33c2ea657e3c0706e4

Explore at:

csv(81006 byte)Available download formats

Unique identifier

https://doi.org/10.6073/pasta/aa552e1f82f95a33c2ea657e3c0706e4

Dataset updated

Feb 17, 2025

Dataset provided by

EDI

Authors

Jonathan Walter

Time period covered

1980 - 2023

Area covered

Variables measured

n, slope, metric, spunit, p_value, species, t_value, slope_se

Description

This data package describes long-term trends in metrics describing population stability and used as statistical early warnings of regime shifts in 29 fish species that inhabit the San Francisco Bay-Delta in central California, USA. Metrics used in this study include spatial synchrony, temporal coefficient of variation (CV), and lag-1 temporal autocorrelation. Trends were measured using ordinary least squares linear regression.

   These derived data were developed from abundance (as CPUE) time series based on three long-term fish monitoring studies included in https://doi.org/10.6073/pasta/a29a6e674b0f8797e13fbc4b08b92e5b; the Fall Midwater Trawl Survey, Delta Juvenile Monitoring Program, and Bay Study. Selected data were from fall months (September to December) in 1980-2023, from midwater trawl and beach seine surveys for which sampling effort (e.g., tow volume) was recorded. Data on fish exceeding maximum length thresholds for age-0 fish were discarded, except for white sturgeon, where the maximum length threshold corresponded to approximately 10 years of age, the onset of reproductive maturity. Observations from different sampling stations were aggregated into 10 sub-regions (South San Francisco Bay, Central San Francisco Bay, San Pablo Bay, Napa River, Suisun Bay, Delta Confluence, South Delta, North Delta, San Joaquin River, Sacramento River, and midwater trawl samples and beach seine samples were considered separately because the methods sample distinct habitat types. Combinations of sub-region and sampling method were considered distinct spatial units.

   EWI metrics were measured in 5-year rolling windows to permit assessment of changes over time. The temporal CV and lag-1 autocorrelation were measured on individual spatial unit time series, ignoring windows with >1 year of missing data. The coefficient of variation divides the standard deviation by the mean. Lag-1 autocorrelation was measured as Pearson correlation. Spatial synchrony was measured across spatial units, ignoring spatial units with >1 year of missing data, and ignoring rolling windows where <3 spatial units had sufficient data. Spatial synchrony was measured as the mean of pairwise Spearman correlations. Trends in EWI metrics were measured only when there were at least 5 rolling window measurements spanning at least 10 years.

True Influence - Proprietary B2B Intent Data Feed (USA)
datarade.ai
.json, .xml
Updated Jun 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
True Influence (2020). True Influence - Proprietary B2B Intent Data Feed (USA) [Dataset]. https://datarade.ai/data-products/true-influence-proprietary-intent-data-feed
Explore at:
.json, .xmlAvailable download formats
Dataset updated
Jun 18, 2020
Dataset provided by
Anteriad, LLC
Authors
True Influence
Area covered
United States of America
Description
Our proprietary intent data is more expansive than what is available from data co-ops or single-source providers, delivering a comprehensive base for your advanced intent analysis. We monitor intent behavior by both executive and managerial customer personas, to help you develop a complete picture of an organizations’ buying dynamics.

Our exclusive Identity Graph technology goes beyond simple reverse IP lookup to identify small and midsize companies that do not have dedicated IP addresses. Our advanced triangulation technologies are based on dozens of variables and pinpoints accounts, locations, and specific individuals who are expressing intent. This critical intent intelligence is either missing or ignored in most other data streams.

Our AI, machine learning, and natural language analysis of content identifies precise topical interest and maps intent activity to our taxonomy of more than 7,000 B2B topics. And we can easily add new topics based upon customer requirements.

The True Influence Relevance Engine™ analyzes intent activity on more than just frequency. We include activity type, topical relevance, and historical trends to find patterns that make intent a strategic differentiator for your solution. Our intent data can take your data-driven sales and marketing solution or service to the next level.
Bias Characterization in Probabilistic Genotype Data and Improved Signal...
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cameron Palmer; Itsik Pe’er (2023). Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation [Dataset]. http://doi.org/10.1371/journal.pgen.1006091
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1006091
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Cameron Palmer; Itsik Pe’er
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.
Data compilation of ciliates growth rate, grazing rate and gross gowth...
doi.pangaea.de
service.tib.eu
+1more
html, tsv
Updated Jan 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sevrine Sailley; Christine Klaas (2014). Data compilation of ciliates growth rate, grazing rate and gross gowth efficiency from field and labratory experiments [Dataset]. http://doi.org/10.1594/PANGAEA.826106
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.826106
Dataset updated
Jan 15, 2014
Dataset provided by
PANGAEA
Authors
Sevrine Sailley; Christine Klaas
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Oct 3, 1979
Area covered
Variables measured
Taxon/taxa, Event label, Cell biovolume, Carbon per cell, Reference/source, Latitude of event, Longitude of event, Treatment: temperature, Gross growth efficiency, Ciliates, cell biovolume, and 7 more
Description
The present data compilation includes ciliates growth rate, grazing rate and gross growth efficiency determined either in the field or in laboratory experiments. From the existing literature, we synthesized all data that we could find on cilliate. Some sources might be missing but none were purposefully ignored. Field data on microzooplankton grazing are mostly comprised of grazing rate using the dilution technique with a 24h incubation period. Laboratory grazing and growth data are focused on pelagic ciliates and heterotrophic dinoflagellates. The experiment measured grazing or growth as a function of prey concentration or at saturating prey concentration (maximal grazing rate). When considering every single data point available (each measured rate for a defined predator-prey pair and a certain prey concentration) there is a total of 1485 data points for the ciliates, counting experiments that measured growth and grazing simultaneously as 1 data point.

Facebook

Twitter

Click to copy link

Link copied

Cite

Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values

Retail Product Dataset with Missing Values

A dataset with numerical categorical values structured missing data for analysis

Explore at:

zip(47826 bytes)Available download formats

Dataset updated

Feb 17, 2025

Authors

Himel Sarder

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

Clear search

Close search

Google apps

Main menu

Retail Product Dataset with Missing Values

Replication data for: A Unified Approach To Measurement Error And Missing...

Data from: A hierarchical Bayesian approach for handling missing...

PPT4J - Data

Identifying Heat Waves in Florida: Considerations of Missing Weather Data

Comprehensive EDA of Residential Features

Context:

Feature Distribution:

Summary statistics of 581 patients’ number of observations

ComBat HarmonizR enables the integrated analysis of independently generated...

Men's Year-End ATP Rankings

Men's Year-End ATP Rankings

A Global Perspective

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Virtual Sensors: Efficiently Estimating Missing Spectra - Dataset - NASA...

Bandit algorithms defined by allocation probability πk,t or index value...

Data from: Rewilded mammal assemblages reveal the missing ecological...

Data from: PHYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology

Pattern of Human Concerns Data, 1957-1963 - Archival Version

Philadelphia Properties and Assessment History

Data from: Exact Bayesian inference for animal movement in continuous time

Data from: Quantifying changes in fish population stability using...

True Influence - Proprietary B2B Intent Data Feed (USA)

Bias Characterization in Probabilistic Genotype Data and Improved Signal...

Data compilation of ciliates growth rate, grazing rate and gross gowth...

Retail Product Dataset with Missing Values

A dataset with numerical categorical values structured missing data for analysis