100+ datasets found
  1. Data from: Evaluating Supplemental Samples in Longitudinal Research:...

    • tandf.figshare.com
    txt
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Laura K. Taylor; Xin Tong; Scott E. Maxwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.

  2. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  3. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  4. Patent AT-E401562-T1: [Translated] DEVICE AND METHOD FOR SAMPLING FOR...

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Biotechnology Information (NCBI) (2025). Patent AT-E401562-T1: [Translated] DEVICE AND METHOD FOR SAMPLING FOR SURFACE ANALYSIS [Dataset]. https://catalog.data.gov/dataset/patent-at-e401562-t1-translated-device-and-method-for-sampling-for-surface-analysis
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Description

    A sampling device comprising an extraction body (3) with a surface (31), which is partially designed as an abrasion area (1), where the abrasion area consists of microcavities (11) that are regularly or randomly arranged and designed as recesses, where the micro cavities have a smaller depth in a micrometer range and the abrasion area is coated or uncoated depending on application, and the micro cavities are formed by charging, is new. An independent claim is also included for abrasion of a microsample on a surface.

  5. e

    Global Sampling And Analysis Limited Export Import Data | Eximpedia

    • eximpedia.app
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Sampling And Analysis Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/global-sampling-and-analysis-limited/06461470
    Explore at:
    Dataset updated
    Feb 7, 2025
    Description

    Global Sampling And Analysis Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  6. NARSTO Texas Particulate Matter (PM) 2.5 Sampling and Analysis Study:...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/LARC/SD/ASDC (2025). NARSTO Texas Particulate Matter (PM) 2.5 Sampling and Analysis Study: 1997-1998 Data [Dataset]. https://catalog.data.gov/dataset/narsto-texas-particulate-matter-pm-2-5-sampling-and-analysis-study-1997-1998-data-aa00b
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Texas
    Description

    NARSTO_Texas_PM2.5_Sampling_and_Analysis_Study_1997-1998_ is the North American Research Strategy for Tropospheric Ozone (NARSTO) Texas Particulate Matter (PM) 2.5 Sampling and Analysis Study: 1997-1998 Data. The data for this product was collected from March 11, 1997 to March 12, 1998. The City of Houston, the Texas Natural Resource Conservation Commission (TNRCC), and the Houston Regional Monitoring Network sponsored sampling and analysis of PM2.5 samples taken over the course of one year, from March 11, 1997 to March 12, 1998. Objectives of the study were to determine the levels and chemical composition of PM2.5 in Houston and other cities in Texas and to determine the background levels and chemical composition of PM2.5 transported into Houston. During the sampling effort, 24-hour PM2.5 mass measurements were acquired from 15 sites throughout the state of Texas, using DRI's MEDVOL particle samples. All of the Teflon filters were analyzed for mass by gravimetry and a selected subset of the Teflon and quartz fiber filters were subjected to full chemical analysis. These measurements were taken in anticipation of the U.S. EPA revising PM2.5 and PM10 NAAQS. These results could be used to establish background PM conditions and determine compliance with new PM standards. Various sampler configurations allow evaluation of data precision, accuracy, and validity. NARSTO, which has since disbanded, was a public/private partnership, whose membership spanned across government, utilities, industry, and academe throughout Mexico, the United States, and Canada. The primary mission was to coordinate and enhance policy-relevant scientific research and assessment of tropospheric pollution behavior; activities provide input for science-based decision-making and determination of workable, efficient, and effective strategies for local and regional air-pollution management. Data products from local, regional, and international monitoring and research programs are still available.

  7. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kurdistan Regional Statistics Office (KRSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Economic Research Forum
    Kurdistan Regional Statistics Office (KRSO)
    Central Statistical Organization (CSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  8. Data from: Supplemental Information

    • figshare.com
    Updated Mar 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sweta Pradhan; Dr Neetha Shetty; Dr Deepa G Kamath (2022). Supplemental Information [Dataset]. http://doi.org/10.6084/m9.figshare.19401524.v2
    Explore at:
    Dataset updated
    Mar 26, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sweta Pradhan; Dr Neetha Shetty; Dr Deepa G Kamath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data contains sample size calculation, statistical analysis and all the results included in this research.

  9. d

    Replication Data for: Measuring transnational social fields through...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HANCEAN, MARIAN-GABRIEL; LUBBERS, MIRANDA JESSICA; MOLINA, JOSE LUIS (2023). Replication Data for: Measuring transnational social fields through binational link-tracing sampling [Dataset]. http://doi.org/10.7910/DVN/XDYGJD
    Explore at:
    Dataset updated
    Nov 19, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    HANCEAN, MARIAN-GABRIEL; LUBBERS, MIRANDA JESSICA; MOLINA, JOSE LUIS
    Description

    These are data and codes to replicate the analysis in our paper "Measuring transnational social fields through binational link-tracing sampling "

  10. Number of samples for each sampling interval.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tongtong Liu; Zheng Yang; Yi Zhao; Chenshu Wu; Zimu Zhou; Yunhao Liu (2023). Number of samples for each sampling interval. [Dataset]. http://doi.org/10.1371/journal.pone.0207697.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tongtong Liu; Zheng Yang; Yi Zhao; Chenshu Wu; Zimu Zhou; Yunhao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of samples for each sampling interval.

  11. Z

    Data from: Data for Predictive Modelling of Laminated Composite Plates

    • data.niaid.nih.gov
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kalita, Kanak; Chakraborty, Shankar; Madhu, S; Ramachandran, Manickam; Gao, Xiao-Zhi (2024). Data for Predictive Modelling of Laminated Composite Plates [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5069420
    Explore at:
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Department of Automobile Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, 602 105, India
    School of Computing, University of Eastern Finland, Kuopio FI-70211, Finland
    Data Analytics Lab, REST Labs, Kaveripattinam, Krishnagiri 635 112, India
    Department of Mechanical Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi 600 062, India
    Department of Production Engineering, Jadavpur University, Kolkata, 700 032, India
    Authors
    Kalita, Kanak; Chakraborty, Shankar; Madhu, S; Ramachandran, Manickam; Gao, Xiao-Zhi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two different problems, i.e. a low-dimensional (LD) and a high-dimensional (HD) problems are considered. The LD problem has 2 variables for a 4-ply symmetric square composite laminate. Similarly, the HD problem consists of 16 variables for a 32-ply symmetric square composite laminate. The value of h for LD and HD problems is taken as 0.005 and 0.04 respectively.

    For each problem, three different types of sampling technique, i.e. random sampling (RS), Latin hypercube sampling (LHS) [1] and Hammersley sampling (HS) [2] are adopted. The RS, LHS and HS primarily differ in the uniformity of sample points over the design space such that RS has the least and HS has the maximum uniform distributions of sample points. Based on the recommendations of Jin et al. [3], and Zhao and Xue [4], 72 and 612 sample points are considered in each training dataset of LD and HD problems respectively.

    Based on the FE formulation, several high-fidelity datasets for the LD and HD problems are generated, as presented in the Supplementary Material file “Predictive modelling of laminated composite plates.xlsx” in nine sheets that are organized as detailed out in Table 1.

    References:

    1.  McKay, M. D.; Beckman, R. J.; Conover, W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 2000, 42, 55-61.
      
    2.  Hammersley, J. M. Monte Carlo methods for solving multivariable problems. Annals of the New York Academy of Sciences, 1960, 86, 844-874.
      
    3.  Jin, R.; Chen, W.; Simpson, T. W. Comparative studies of metamodelling techniques under multiple modelling criteria. Structural and Multidisciplinary Optimization, 2001, 23, 1-13.
      
    4.  Zhao, D.; Xue, D. A comparative study of metamodeling methods considering sample quality merits. Structural and Multidisciplinary Optimization, 2010, 42, 923-938.
      
  12. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  13. Global Retail Sales Data: Orders, Reviews & Trends

    • kaggle.com
    zip
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adarsh Anil Kumar (2024). Global Retail Sales Data: Orders, Reviews & Trends [Dataset]. https://www.kaggle.com/datasets/adarsh0806/influencer-merchandise-sales
    Explore at:
    zip(125403 bytes)Available download formats
    Dataset updated
    Dec 10, 2024
    Authors
    Adarsh Anil Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.

    This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:

    Order ID: Serves as an identifier for each order made.

    Order Date: The date when the order was made.

    Product ID: Serves as an identifier for the product that was ordered.

    Product Category: Category of Product sold(Clothing, Ornaments, Other).

    Buyer Gender: Genders of people that have ordered from the website (Male, Female).

    Buyer Age: Ages of the buyers.

    Order Location: The city where the order was made from.

    International Shipping: Whether the product was shipped internationally or not. (Yes/No)

    Sales Price: Price tag for the product.

    Shipping Charges: Extra charges for international shipments.

    Sales per Unit: Sales cost while including international shipping charges.

    Quantity: Quantity of the product bought.

    Total Sales: Total sales made through the purchase.

    Rating: User rating given for the order.

    Review: User review given for the order.

  14. None -

    • plos.figshare.com
    xls
    Updated Jun 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohd Nazim; Ali Abbas Falah Alzubi (2025). None - [Dataset]. http://doi.org/10.1371/journal.pone.0326735.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mohd Nazim; Ali Abbas Falah Alzubi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study intends to empower English as a Foreign Language (EFL) teachers’ perceptions of generative artificial intelligence (AI)-mediated self-professionalism in engagement, attitudes, constraints, and solutions. Employing the mixed methods research design, the researchers collected data from male and female teachers (N = 278) of eight public universities, utilizing convenience sampling and a set of instruments: a questionnaire and a semi-structured interview. The data analysis combined quantitative and qualitative methods, using SPSS version 26 for statistical analysis (Pearson correlation, Cronbach’s alpha, means, standard deviations), and thematic analysis for qualitative data, with data triangulation employed to compare questionnaire and interview responses for a comprehensive understanding of EFL teachers’ engagement with generative AI. The results revealed that the study sample engaged in self-professionalism at a medium level, yet they hold high attitudes toward generative AI-mediated self-professionalism. In addition, the content analysis exhibited several constraints, including technological competence and AI literacy, AI-generated content reliability and accuracy, ethical issues, and encroachment on professional autonomy. Moreover, the respondents proposed solutions such as offering AI-driven training programs, establishing clear ethical guidelines and protocols, emphasizing AI as a supplementary tool rather than a substitute, and implementing impartial access mechanisms for AI content to strengthen EFL teachers’ self-professionalism mediated by generative AI. Studies in the context of generative AI-driven self-professionalism appear limited, particularly in the context of Arab higher education institutions. This dearth of research presents an opportunity for the current study to make significant improvements in contributing innovative insights to the EFL teachers’ self-professionalism landscape.

  15. w

    Multiple Indicator Cluster Survey 2005 - Jamaica

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Sep 26, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistical Institute Of Jamaica (2013). Multiple Indicator Cluster Survey 2005 - Jamaica [Dataset]. https://microdata.worldbank.org/index.php/catalog/17
    Explore at:
    Dataset updated
    Sep 26, 2013
    Dataset authored and provided by
    Statistical Institute Of Jamaica
    Time period covered
    2005 - 2006
    Area covered
    Jamaica
    Description

    Abstract

    The Multiple Indicator Cluster Survey (MICS) is a household survey programme developed by UNICEF to assist countries in filling data gaps for monitoring human development in general and the situation of children and women in particular. MICS is capable of producing statistically sound, internationally comparable estimates of social indicators. The current round of MICS is focused on providing a monitoring tool for the Millennium Development Goals (MDGs), the World Fit for Children (WFFC), as well as for other major international commitments.

    Survey Objectives The 2005 Jamaica Multiple Indicator Cluster Survey has as its primary objectives: - To provide up-to-date information for assessing the situation of children and women in Jamaica. - To furnish data needed for monitoring progress toward goals established by the Millennium Development Goals, the goals of A World Fit For Children (WFFC), and other internationally agreed upon goals, as a basis for future action; - To contribute to the improvement of data and monitoring systems in Jamaica and to strengthen technical expertise in the design, implementation, and analysis of such systems.

    Survey Content MICS questionnaires are designed in a modular fashion that can be easily customized to the needs of a country. They consist of a household questionnaire, a questionnaire for women aged 15-49 and a questionnaire for children under the age of five (to be administered to the mother or caretaker). Other than a set of core modules, countries can select which modules they want to include in each questionnaire.

    Survey Implementation The survey was carried out by STATIN with the support and assistance of UNICEF and other partners. Technical assistance and training for the surveys is provided through a series of regional workshops, covering questionnaire content, sampling and survey implementation; data processing; data quality and data analysis; report writing and dissemination.

    Geographic coverage

    The survey is nationally representative and covers the whole of Jamaica.

    Analysis unit

    Households (defined as a group of persons who usually live and eat together)

    De jure household members (defined as members of the household who usually live in the household, which may include people who did not sleep in the household the previous night, but does not include visitors who slept in the household the previous night but do not usually live in the household)

    Women aged 15-49

    Children aged 0-4

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49 years resident in the household, and all children aged 0-4 years (under age 5) resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for the Jamaica Multiple Indicator Cluster Survey (MICS) was designed to provide estimates on a large number of indicators on the situation of children and women at the national level, as well as urban and rural areas. Parishes were identified as the main sampling domains and were divided into sampling regions of equal sizes. The sample was selected in two stages. Within each sampling region, two census enumeration areas/Primary Sampling Units (PSUs) were selected with probability proportional to size. Using the household listing from the selected PSUs a systematic sample of 6,276 dwellings was drawn.

    The sampling procedures are more fully described in the the sampling appendix (appendix A) of the final report.

    Sampling deviation

    Five of the selected enumeration areas were not visited because they were inaccessible due to flooding during the fieldwork period. Sample weights were used in the calculation of national level results.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaires for the Jamaica MICS were structured questionnaires based on the MICS3 Model Questionnaire with some modifications and additions. A household questionnaire was administered in each household, which collected various information on household members including sex, age, relationship, and orphanhood status. The household questionnaire includes support to orphaned and vulnerable children, education, child labour, water and sanitation, and salt iodization, with optional modules for child discipline, child disability and security of tenure and durability of housing. In addition to a household questionnaire, questionnaires were administered in each household for women age 15-49 and children under age five. For children, the questionnaire was administered to the mother or caretaker of the child. The women's questionnaire include women's characteristics, child mortality, tetanus toxoid, maternal and newborn health, marriage, contraception, and HIV/AIDS knowledge, with optional modules for unmet need, domestic violence, and sexual behavior. The children's questionnaire includes children's characteristics, birth registration and early learning, vitamin A, breastfeeding, care of illness, malaria, immunization, and an optional module for child development. All questionnaires and modules are provided as external resources.

    Cleaning operations

    Data editing took place at a number of stages throughout the processing (see Other processing), including: a) Office editing and coding b) During data entry c) Structure checking and completeness d) Secondary editing e) Structural checking of SPSS data files

    Detailed documentation of the editing of data can be found in the data processing guidelines

    Response rate

    In the 6,276 dwellings selected for the sample, 5,604 households were found to be occupied (Table HH.1). Of these, 4,767 were successfully interviewed for a household response rate of 85.1 percent. The reason for this lower response rate is given in the previous section. In the interviewed households, 3,777 women (age 15-49) were identified. Of these, 3,647 were successfully interviewed, yielding a response rate of 96.6 percent. In addition, 1,444 children under age five were listed in the household questionnaire. Of these, questionnaires were completed for 1,427 which correspond to a response rate of 98.8 percent.

    Overall response rates of 82.1 and 84.1 percent were calculated for the women's and under-5's interviews respectively. Note that the response rates for the Kingston Metropolitan Area (KMA) were lower than in other urban areas and in the rural area. Two factors contributed to this - more dwellings were vacant, often as a result of urban violence, and in the upper income areas access to dwellings was more difficult. In the rural areas, the rains prevented access to some households as some roads were inundated.

    Sampling error estimates

    Estimates from a sample survey are affected by two types of errors: 1) non-sampling errors and 2) sampling errors. Non-sampling errors are the results of mistakes made in the implementation of data collection and data processing. Numerous efforts were made during implementation of the 2005-2006 MICS to minimize this type of error, however, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors can be evaluated statistically. The sample of respondents to the 2005-2006 MICS is only one of many possible samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differe somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability in the results of the survey between all possible samples, and, although, the degree of variability is not known exactly, it can be estimated from the survey results. The sampling erros are measured in terms of the standard error for a particular statistic (mean or percentage), which is the square root of the variance. Confidence intervals are calculated for each statistic within which the true value for the population can be assumed to fall. Plus or minus two standard errors of the statistic is used for key statistics presented in MICS, equivalent to a 95 percent confidence interval.

    If the sample of respondents had been a simple random sample, it would have been possible to use straightforward formulae for calculating sampling errors. However, the 2005-2006 MICS sample is the result of a multi-stage stratified design, and consequently needs to use more complex formulae. The SPSS complex samples module has been used to calculate sampling errors for the 2005-2006 MICS. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. This method is documented in the SPSS file CSDescriptives.pdf found under the Help, Algorithms options in SPSS.

    Sampling errors have been calculated for a select set of statistics (all of which are proportions due to the limitations of the Taylor linearization method) for the national sample, urban and rural areas, and for each of the five regions. For each statistic, the estimate, its standard error, the coefficient of variation (or relative error -- the ratio between the standard error and the estimate), the design effect, and the square root design effect (DEFT -- the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used), as well as the 95 percent confidence intervals (+/-2 standard errors).

    Details of the sampling errors are presented in the sampling errors appendix to the report and in the sampling errors table presented in te external resources.

    Data

  16. Data from: Sample size to evaluate the degree of multicollinearity in rye...

    • scielo.figshare.com
    tiff
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ismael Mario Márcio Neu; Alberto Cargnelutti Filho; Marcos Toebe; Fernanda Carini; Rafael Vieira Pezzini; Daniela Lixinski Silveira (2023). Sample size to evaluate the degree of multicollinearity in rye morphological traits [Dataset]. http://doi.org/10.6084/m9.figshare.22268669.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Ismael Mario Márcio Neu; Alberto Cargnelutti Filho; Marcos Toebe; Fernanda Carini; Rafael Vieira Pezzini; Daniela Lixinski Silveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Investigation of multicollinearity allows parameters in multivariate analysis to be estimated with higher precision and with biological interpretation. In order to generate reliable estimates of the degree of multicollinearity, it is necessary to use appropriate sample size. Thus, the objectives of this study were to determine the sample size (number of plants) necessary to estimate the indicators of the degree of multicollinearity - condition number (CN), correlation matrix determinant (DET), and variance inflation factor (VIF) - in morphological traits of rye and to verify the variability of the sample size between the indicators. Five and three uniformity trials were conducted with the cultivars BRS Progresso and Temprano, respectively. Eight morphological traits were evaluated in 780 plants in eight trials. For each trial, 22 cases were selected among the 28 formed by the combination of eight traits, taken six by six, totaling 176 cases. In each case, 197 sample sizes were planned (20, 25, 30, ..., 1,000 plants) and in each size 2,000 resampling procedures with replacement were performed, CN, DET, and VIF were determined and the average among 2,000 estimates was calculated. For each case and indicator (CN, DET, and VIF), the sample size was determined through three models: modified maximum curvature method and linear and quadratic segmented models with plateau response. There is variability between sample sizes between indicators, with larger sample sizes required for DET, followed by CN and VIF, in that order, with at least 180, 116 and 85 plants, respectively.

  17. The Home Depot products dataset

    • kaggle.com
    zip
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2021). The Home Depot products dataset [Dataset]. https://www.kaggle.com/datasets/crawlfeeds/the-home-depot-products-dataset
    Explore at:
    zip(1979687 bytes)Available download formats
    Dataset updated
    Dec 13, 2021
    Authors
    Crawl Feeds
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Home Depot retail products dataset

    Content

    Sample home depot dataset included more than 3500+ records Total Fields: 13 Format: CSV Fields: url, title, images, description, product_id, sku, gtin13, brand, price, currency, availability, uniq_id, scraped_at

    Acknowledgements

    Crawl Feeds team extracted data from the home depot. Download complete dataset with more than 1 million+ products in csv format

    Inspiration

    The Home depot dataset useful for research and analysis purposes

  18. n

    Data from: An optimized protocol for large-scale in situ sampling and...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +3more
    zip
    Updated May 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jyothi V. Nair; Pragadheesh V. Shanmugam; Snehal D. Karpe; Uma Ramakrishnan; Shannon Olsson (2018). An optimized protocol for large-scale in situ sampling and analysis of volatile organic compounds [Dataset]. http://doi.org/10.5061/dryad.kp18283
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 24, 2018
    Dataset provided by
    Tata Institute of Fundamental Research
    Authors
    Jyothi V. Nair; Pragadheesh V. Shanmugam; Snehal D. Karpe; Uma Ramakrishnan; Shannon Olsson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    India, Rajasthan, South Asia
    Description

    Chemical ecology is an ever‐expanding field with a growing interest in population‐ and community‐level studies. Many such studies are hindered due to lack of an efficient and accelerated protocol for large‐scale sampling and analysis of chemical compounds. Here, we present an optimized protocol for such large‐scale study of volatiles. A large‐scale in situ study to understand role of semiochemicals in variation in mating success of lekking blackbuck was conducted. Suitable methods for sampling and statistical analysis were identified by testing and comparing the efficiencies of available techniques to reduce analysis time while retaining sensitivity and comprehensiveness. Solid‐phase extraction using polydimethylsiloxane, analysis using a semiautomated detection of retention time and base peak, and statistical analysis using random forest algorithm were identified as the most efficient methods for large‐scale in situ sampling and analysis of volatiles. The protocol for large‐scale volatile analysis can facilitate evolutionary and metaecological studies of volatiles in situ from all types of biological samples. The protocol has potential for wider application with the analysis and interpretation methods being suitable for all kinds of semiochemicals, including nonvolatile chemicals.

  19. European Union Statistics on Income and Living Conditions 2008 -...

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2019). European Union Statistics on Income and Living Conditions 2008 - Cross-Sectional User Database - Germany [Dataset]. https://catalog.ihsn.org/catalog/5618
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    Time period covered
    2008
    Area covered
    Germany
    Description

    Abstract

    EU-SILC has become the EU reference source for comparative statistics on income distribution and social exclusion at European level, particularly in the context of the "Program of Community action to encourage cooperation between Member States to combat social exclusion" and for producing structural indicators on social cohesion for the annual spring report to the European Council. The first priority is to be given to the delivery of comparable, timely and high quality cross-sectional data.

    There are two types of datasets: 1) Cross-sectional data pertaining to fixed time periods, with variables on income, poverty, social exclusion and living conditions. 2) Longitudinal data pertaining to individual-level changes over time, observed periodically - usually over four years.

    Social exclusion and housing-condition information is collected at household level. Income at a detailed component level is collected at personal level, with some components included in the "Household" section. Labour, education and health observations only apply to persons 16 and older. EU-SILC was established to provide data on structural indicators of social cohesion (at-risk-of-poverty rate, S80/S20 and gender pay gap) and to provide relevant data for the two 'open methods of coordination' in the field of social inclusion and pensions in Europe.

    The sixth revision of the 2008 Cross-Sectional User Database (UDB) as released in May 2014 is documented here.

    Geographic coverage

    National

    Analysis unit

    • Households;
    • Individuals 16 years and older.

    Universe

    The survey covered all household members over 16 years old. Persons living in collective households and in institutions are generally excluded from the target population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    On the basis of various statistical and practical considerations and the precision requirements for the most critical variables, the minimum effective sample sizes to be achieved were defined. Sample size for the longitudinal component refers, for any pair of consecutive years, to the number of households successfully interviewed in the first year in which all or at least a majority of the household members aged 16 or over are successfully interviewed in both the years.

    For the cross-sectional component, the plans are to achieve the minimum effective sample size of around 131.000 households in the EU as a whole (137.000 including Iceland and Norway). The allocation of the EU sample among countries represents a compromise between two objectives: the production of results at the level of individual countries, and production for the EU as a whole. Requirements for the longitudinal data will be less important. For this component, an effective sample size of around 98.000 households (103.000 including Iceland and Norway) is planned.

    Member States using registers for income and other data may use a sample of persons (selected respondents) rather than a sample of complete households in the interview survey. The minimum effective sample size in terms of the number of persons aged 16 or over to be interviewed in detail is in this case taken as 75 % of the figures shown in columns 3 and 4 of the table I, for the cross-sectional and longitudinal components respectively.

    The reference is to the effective sample size, which is the size required if the survey were based on simple random sampling (design effect in relation to the 'risk of poverty rate' variable = 1.0). The actual sample sizes will have to be larger to the extent that the design effects exceed 1.0 and to compensate for all kinds of non-response. Furthermore, the sample size refers to the number of valid households which are households for which, and for all members of which, all or nearly all the required information has been obtained. For countries with a sample of persons design, information on income and other data shall be collected for the household of each selected respondent and for all its members.

    At the beginning, a cross-sectional representative sample of households is selected. It is divided into say 4 sub-samples, each by itself representative of the whole population and similar in structure to the whole sample. One sub-sample is purely cross-sectional and is not followed up after the first round. Respondents in the second sub-sample are requested to participate in the panel for 2 years, in the third sub-sample for 3 years, and in the fourth for 4 years. From year 2 onwards, one new panel is introduced each year, with request for participation for 4 years. In any one year, the sample consists of 4 sub-samples, which together constitute the cross-sectional sample. In year 1 they are all new samples; in all subsequent years, only one is new sample. In year 2, three are panels in the second year; in year 3, one is a panel in the second year and two in the third year; in subsequent years, one is a panel for the second year, one for the third year, and one for the fourth (final) year.

    According to the Commission Regulation on sampling and tracing rules, the selection of the sample will be drawn according to the following requirements:

    1. For all components of EU-SILC (whether survey or register based), the cross-sectional and longitudinal (initial sample) data shall be based on a nationally representative probability sample of the population residing in private households within the country, irrespective of language, nationality or legal residence status. All private households and all persons aged 16 and over within the household are eligible for the operation.
    2. Representative probability samples shall be achieved both for households, which form the basic units of sampling, data collection and data analysis, and for individual persons in the target population.
    3. The sampling frame and methods of sample selection shall ensure that every individual and household in the target population is assigned a known and non-zero probability of selection.
    4. By way of exception, paragraphs 1 to 3 shall apply in Germany exclusively to the part of the sample based on probability sampling according to Article 8 of the Regulation of the European Parliament and of the Council (EC) No 1177/2003 concerning

    Community Statistics on Income and Living Conditions. Article 8 of the EU-SILC Regulation of the European Parliament and of the Council mentions: 1. The cross-sectional and longitudinal data shall be based on nationally representative probability samples. 2. By way of exception to paragraph 1, Germany shall supply cross-sectional data based on a nationally representative probability sample for the first time for the year 2008. For the year 2005, Germany shall supply data for one fourth based on probability sampling and for three fourths based on quota samples, the latter to be progressively replaced by random selection so as to achieve fully representative probability sampling by 2008. For the longitudinal component, Germany shall supply for the year 2006 one third of longitudinal data (data for year 2005 and 2006) based on probability sampling and two thirds based on quota samples. For the year 2007, half of the longitudinal data relating to years 2005, 2006 and 2007 shall be based on probability sampling and half on quota sample. After 2007 all of the longitudinal data shall be based on probability sampling.

    Detailed information about sampling is available in Quality Reports in Documentation.

    Mode of data collection

    Mixed

  20. Synthetic Univariate Time Series Data

    • kaggle.com
    zip
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil R (2025). Synthetic Univariate Time Series Data [Dataset]. https://www.kaggle.com/datasets/nikhilr612/synthetic-univariate-time-series-data
    Explore at:
    zip(204494 bytes)Available download formats
    Dataset updated
    May 4, 2025
    Authors
    Nikhil R
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A synthetic dataset made by applying noise and irregular sampling to well-known signals such as triangle, exponentially decaying sinusoids, and HeaviSine. This is mainly intended for an informal comparative study of performance of Time Series models. Each file consists of exactly 500 data points sampled from the curve with added levels of noise, eg. "constant_white_high_step_high" indicates that high white-noise and irregular sampling were applied to a constant signal. For slightly more realistic scenarios, a 500 length simulation of a Wiener process is also included.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
Organization logo

Data from: Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Laura K. Taylor; Xin Tong; Scott E. Maxwell
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.

Search
Clear search
Close search
Google apps
Main menu