100+ datasets found

Sample data analysis
kaggle.com
zip
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Hamith (2023). Sample data analysis [Dataset]. https://www.kaggle.com/datasets/abdulhamith/sample-data-analysis
Explore at:
zip(998859 bytes)Available download formats
Dataset updated
Apr 28, 2023
Authors
Abdul Hamith
Description
Dataset

This dataset was created by Abdul Hamith

Contents
Data from: RESEARCH METHODOLOGY FOR NOVELTY TECHNOLOGY
scielo.figshare.com
search.datacite.org
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P.C. Lai (2023). RESEARCH METHODOLOGY FOR NOVELTY TECHNOLOGY [Dataset]. http://doi.org/10.6084/m9.figshare.7482734.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7482734.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
P.C. Lai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This paper contributes to the existing literature by reviewing the research methodology and the literature review with the focus on potential applications for the novelty technology of the single platform E-payment. These included, but were not restricted to the subjects, population, sample size requirement, data collection method and measurement of variables, pilot study and statistical techniques for data analysis. The reviews will shed some light and potential applications for future researchers, students and others to conceptualize, operationalize and analyze the underlying research methodology to assist in the development of their research methodology.
H
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
dataverse.harvard.edu
search.dataone.org
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Apr 28, 2020
Dataset provided by
Harvard Dataverse
Authors
Jamie Monogan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Data from: PISA Data Analysis Manual: SPSS, Second Edition
catalog.data.gov
s.cnmilf.com
Updated Mar 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). PISA Data Analysis Manual: SPSS, Second Edition [Dataset]. https://catalog.data.gov/dataset/pisa-data-analysis-manual-spss-second-edition
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
United States Department of Statehttp://state.gov/
Description
The OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Central Statistical Organization (CSO)
Economic Research Forum
Kurdistan Regional Statistics Office (KRSO)
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
c
Sample Sales Dataset
cubig.ai
zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sample Sales Dataset [Dataset]. https://cubig.ai/store/products/477/sample-sales-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.

2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
f
Maximum Analysis Sample Sizes by Analysis Type and Data Source.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murray, Aja Louise; Obsuth, Ingrid; Sutherland, Alex; Eisner, Manuel; Pilbeam, Liv; Cope, Aiden (2016). Maximum Analysis Sample Sizes by Analysis Type and Data Source. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001509657
Explore at:
Dataset updated
Apr 6, 2016
Authors
Murray, Aja Louise; Obsuth, Ingrid; Sutherland, Alex; Eisner, Manuel; Pilbeam, Liv; Cope, Aiden
Description
Maximum Analysis Sample Sizes by Analysis Type and Data Source.
d
Health and Retirement Study (HRS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELEKOY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Considerations for analyzing EMA data (Oleson et al., 2021)
asha.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob J. Oleson; Michelle A. Jones; Erik J. Jorgensen; Yu-Hsiang Wu (2023). Considerations for analyzing EMA data (Oleson et al., 2021) [Dataset]. http://doi.org/10.23641/asha.17155961.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.23641/asha.17155961.v1
Dataset updated
May 30, 2023
Dataset provided by
American Speech–Language–Hearing Associationhttps://www.asha.org/
Authors
Jacob J. Oleson; Michelle A. Jones; Erik J. Jorgensen; Yu-Hsiang Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose: The analysis of Ecological Momentary Assessment (EMA) data can be difficult to conceptualize due to the complexity of how the data are collected. The goal of this tutorial is to provide an overview of statistical considerations for analyzing observational data arising from EMA studies.Method: EMA data are collected in a variety of ways, complicating the statistical analysis. We focus on fundamental statistical characteristics of the data and general purpose statistical approaches to analyzing EMA data. We implement those statistical approaches using a recent study involving EMA.Results: The linear or generalized linear mixed-model statistical approach can adequately capture the challenges resulting from EMA collected data if properly set up. Additionally, while sample size depends on both the number of participants and the number of survey responses per participant, having more participants is more important than the number of responses per participant.Conclusion: Using modern statistical methods when analyzing EMA data and adequately considering all of the statistical assumptions being used can lead to interesting and important findings when using EMA.Supplemental Material S1. Power for given effect sizes, number of participants, and number of surveys per individual for a two independent groups comparison.Supplemental Material S2. Power for given effect sizes, number of participants, and number of surveys per individual for a paired groups comparison.Oleson, J. J., Jones, M. A., Jorgensen, E. J., & Wu, Y.-H. (2021). Statistical considerations for analyzing Ecological Momentary Assessment data. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2021_JSLHR-21-00081
Streaming Service Data
kaggle.com
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chad Wambles (2024). Streaming Service Data [Dataset]. https://www.kaggle.com/datasets/chadwambles/streaming-service-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chad Wambles
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.

1000 rows and 25 columns of connected data.

See below for column descriptions.

Enjoy :)
Data from: Evaluating Supplemental Samples in Longitudinal Research:...
tandf.figshare.com
txt
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura K. Taylor; Xin Tong; Scott E. Maxwell (2024). Evaluating Supplemental Samples in Longitudinal Research: Replacement and Refreshment Approaches [Dataset]. http://doi.org/10.6084/m9.figshare.12162072.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12162072.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Laura K. Taylor; Xin Tong; Scott E. Maxwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Superstore Marketing Campaign Dataset
kaggle.com
zip
Updated Jan 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahsan Raza (2023). Superstore Marketing Campaign Dataset [Dataset]. https://www.kaggle.com/datasets/ahsan81/superstore-marketing-campaign-dataset/code
Explore at:
zip(56728 bytes)Available download formats
Dataset updated
Jan 2, 2023
Authors
Ahsan Raza
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context- A superstore is planning for the year-end sale. They want to launch a new offer - gold membership, that gives a 20% discount on all purchases, for only $499 which is $999 on other days. It will be valid only for existing customers and the campaign through phone calls is currently being planned for them. The management feels that the best way to reduce the cost of the campaign is to make a predictive model which will classify customers who might purchase the offer. Objective - The superstore wants to predict the likelihood of the customer giving a positive response and wants to identify the different factors which affect the customer's response. You need to analyze the data provided to identify these factors and then build a prediction model to predict the probability of a customer will give a positive response.
d
Data from: Sediment sample analysis for calcium carbonate of sample...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Sediment sample analysis for calcium carbonate of sample collected in the East and West Flower Garden regions, northwestern Gulf of Mexico outer shelf [Dataset]. https://catalog.data.gov/dataset/sediment-sample-analysis-for-calcium-carbonate-of-sample-collected-in-the-east-and-west-fl
Explore at:
Dataset updated
Nov 25, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gulf of Mexico (Gulf of America)
Description
This file contains location and carbonate content analysis of samples taken during Cruise No. FERL01052 aboard the NOAA Ship Ferrel. These samples were taken on East and West Flower Garden Banks of the Flower Gardens Bank National Marine Sanctuary between May 28, 2002 and June 3, 2002. The information collected during this cruise is intended for a preliminary geologic interpretation of the surficial sediment distribution in order to determine sites for future sample collection. The interpretations presented in this Open File Report are subject to change with future data acquisition.
Data from: Analysis of dust samples from the Russian part of the ISS
data.nasa.gov
datasets.ai
+3more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Analysis of dust samples from the Russian part of the ISS [Dataset]. https://data.nasa.gov/dataset/analysis-of-dust-samples-from-the-russian-part-of-the-iss-c2f2a
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Our study focuses on the hardiest microorganisms inhabiting the ISS in order to assess their diversity and capabilities to resist certain stresses. We specifically selected dust samples from the Russian modules that were obtained 8-10 years ago and stored since then under sealed conditions on Earth. Targeting long-time survivors and spore-forming microorganisms we assessed consequently the cultivable microbial community of these samples in order to obtain model microbial strains that could help to analyze specific adaptation towards environmental stresses such as desiccation and lack of nutrients. In this study we analyzed these microorganisms with respect to their resistance towards thermal stress and exposure to clinically relevant antibiotics. In addition we assessed the bacterial and archaeal community via molecular methods (NGS sequencing) and compared our new data with the previously derived information from the ISS microbiome.
Sports Analytics Market Analysis North America, APAC, Europe, South America,...
technavio.com
pdf
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Sports Analytics Market Analysis North America, APAC, Europe, South America, Middle East and Africa - US, Canada, China, Germany, UK, India, Japan, France, Italy, South Korea - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/sports-analytics-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jan 29, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Description
Snapshot img

Sports Analytics Market Size 2025-2029

The sports analytics market size is valued to increase USD 8.4 billion, at a CAGR of 28.5% from 2024 to 2029. Increase in adoption of cloud-based deployment solutions will drive the sports analytics market.

Major Market Trends & Insights

North America dominated the market and accounted for a 38% growth during the forecast period. By Type - Football segment was valued at USD 749.30 billion in 2023 By Solution - Player analysis segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 584.13 million Market Future Opportunities: USD 8403.30 million CAGR : 28.5% North America: Largest market in 2023

Market Summary

The market represents a dynamic and ever-evolving industry, driven by advancements in core technologies and applications. Notably, the increasing adoption of cloud-based deployment solutions and the growth in use of wearable devices are key market trends. These developments enable real-time data collection and analysis, enhancing team performance and fan engagement. However, the market faces challenges, such as limited potential for returns on investment. Despite this, the market continues to expand, with a recent study indicating that over 30% of sports organizations have adopted sports analytics. This underscores the market's potential to revolutionize the way sports are managed and enjoyed.

What will be the Size of the Sports Analytics Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Sports Analytics Market Segmented and what are the key trends of market segmentation?

The sports analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Type Football Cricket Hockey Tennis Others Solution Player analysis Team performance analysis Health assessment Fan engagement analysis Others Geography North America US Canada Europe France Germany Italy UK APAC China India Japan South Korea Rest of World (ROW)

By Type Insights

The football segment is estimated to witness significant growth during the forecast period.

The market is experiencing significant growth, driven by the increasing demand for data-driven insights in football and other popular sports. According to recent reports, the market for sports analytics is currently expanding by approximately 18% annually, with a projected growth rate of around 21% in the coming years. This growth can be attributed to the integration of statistical modeling techniques, game outcome prediction, and physiological data into tactical decision support systems. Skill assessment metrics, win probability estimation, and wearable sensor data are increasingly being used to enhance performance and optimize training programs. Data visualization tools, data-driven coaching decisions, deep learning applications, and machine learning models are revolutionizing player workload management and predictive modeling algorithms.

Request Free Sample

The Football segment was valued at USD 749.30 billion in 2019 and showed a gradual increase during the forecast period.

Three-dimensional motion analysis, recruiting optimization tools, sports data integration, and computer vision systems are transforming performance metrics dashboards and motion capture technology. Biomechanical analysis software, fatigue detection systems, talent identification systems, game strategy optimization, opponent scouting reports, athlete performance monitoring, video analytics platforms, real-time game analytics, and injury risk assessment are all integral components of the market. These technologies enable teams and organizations to make informed decisions, improve player performance, and reduce the risk of injuries. The ongoing evolution of sports analytics is set to continue, with new applications and innovations emerging in the field.

Request Free Sample

Regional Analysis

North America is estimated to contribute 38% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

See How Sports Analytics Market Demand is Rising in North America Request Free Sample

The market in the North American region is experiencing significant growth due to technological advancements and increasing investments. In 2024, the US and Canada were major contributors to this expansion. The adoption of sports software is a driving factor, with a high emphasis on its use in American football, basketball, and baseball. Major sports leagues in the US are
Data from: MSL MARS SAMPLE ANALYSIS AT MARS 5 RDR LEVEL 2 V1.0
data.nasa.gov
s.cnmilf.com
+1more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MSL MARS SAMPLE ANALYSIS AT MARS 5 RDR LEVEL 2 V1.0 [Dataset]. https://data.nasa.gov/dataset/msl-mars-sample-analysis-at-mars-5-rdr-level-2-v1-0-96708
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
TBD (one or two paragraph summary)
Pre and Post-Exercise Heart Rate Analysis
kaggle.com
zip
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah M Almutairi (2024). Pre and Post-Exercise Heart Rate Analysis [Dataset]. https://www.kaggle.com/datasets/abdullahmalmutairi/pre-and-post-exercise-heart-rate-analysis
Explore at:
zip(3857 bytes)Available download formats
Dataset updated
Sep 29, 2024
Authors
Abdullah M Almutairi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview:

This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.

The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.

File: Filename: heart_rate_data.csv File Format: CSV

- Features (Columns):

Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.

Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.

Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.

Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.

Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.

Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.

Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.

Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.

Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.

Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.

License: Choose an appropriate open license, such as:

CC BY 4.0 (Attribution 4.0 International).

Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?

Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
Data from: MSL MARS SAMPLE ANALYSIS AT MARS 4 RDR LEVEL 1B V1.0
catalog.data.gov
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). MSL MARS SAMPLE ANALYSIS AT MARS 4 RDR LEVEL 1B V1.0 [Dataset]. https://catalog.data.gov/dataset/msl-mars-sample-analysis-at-mars-4-rdr-level-1b-v1-0-4232c
Explore at:
Dataset updated
Aug 22, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The MSL SAM Level 1B data set is generated by applying corrections to Level 1A data, e.g. detector dead time, TCD temperature, noise removal, corrections for saturation and instrument response function.
Automated particle analysis (SEM/EDS) data from samples known to have been...
catalog.data.gov
s.cnmilf.com
+2more
Updated Jul 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Automated particle analysis (SEM/EDS) data from samples known to have been exposed to gunshot residue and from samples occasionally mistaken for gunshot residue - like brake dust and fireworks. [Dataset]. https://catalog.data.gov/dataset/automated-particle-analysis-sem-eds-data-from-samples-known-to-have-been-exposed-to-gunsho
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Automated particle analysis (SEM/EDS) data from samples known to have been exposed to gunshot residue and from samples occasionally mistaken for gunshot residue - like brake dust and fireworks. The dataset consists of analyses of 30 discrete samples: 12 from sampling automobiles ("brake dust"), 10 from sampling fireworks ("sparklers" and "spinners" and "roman candles"), 8 from shooter's left or right hands. The analysis configuration meta-data for each analysis are contained in the "configuration.txt" and "script.py" files. The raw data from each analysis is in the file pair "data.pxz" and "data.hdz". The HDZ-file details the contents of the PXZ-file. In addition, the "mag0" directory contains TIFF images with embedded X-ray spectra for each particle in the dataset. Additional HDZ/PXZ files contain the results of reprocessing the "data.hdz/.pxz" in light of the "mag0" spectra and the standard spectra in "25 keV.zip" The samples came from Amy Reynolds (amy.reynolds@pd.boston.gov) at the Boston Police Department. The "Shooter" samples were taken from a volunteer who fired a gun at a local firing range and was then sampled immediately after. They are part of a time series that was used to study GSR retention. The TIFF Image/Spectrum files can be read using NIST DTSA-II (https://www.nist.gov/services-resources/software/nist-dtsa-ii) or NeXLSpectrum.jl (https://doi.org/10.18434/M32286). The HDZ/PXZ files can be read using NIST Graf (available on request) or NeXLParticle.jl (https://github.com/usnistgov/NeXLParticle.jl).