100+ datasets found

Retail Analysis on Large Dataset
kaggle.com
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahil Prajapati (2024). Retail Analysis on Large Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693643
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8693643
Dataset updated
Jun 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sahil Prajapati
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description:

The dataset represents retail transactional data. It contains information about customers, their purchases, products, and transaction details. The data includes various attributes such as customer ID, name, email, phone, address, city, state, zipcode, country, age, gender, income, customer segment, last purchase date, total purchases, amount spent, product category, product brand, product type, feedback, shipping method, payment method, and order status.

Key Points:

Customer Information:

Includes customer details like ID, name, email, phone, address, city, state, zipcode, country, age, and gender. Customer segments are categorized into Premium, Regular, and New. ##Transaction Details:

Transaction-specific data such as transaction ID, last purchase date, total purchases, amount spent, total purchase amount, feedback, shipping method, payment method, and order status. ##Product Information:

Contains product-related details such as product category, brand, and type. Products are categorized into electronics, clothing, grocery, books, and home decor. ##Geographic Information:

Contains location details including city, state, and country. Available for various countries including USA, UK, Canada, Australia, and Germany. ##Temporal Information:

Last purchase date is provided along with separate columns for year, month, date, and time. Allows analysis based on temporal patterns and trends. ##Data Quality:

Some rows contain null values, and others are duplicates, which may need to be handled during data preprocessing. Null values are randomly distributed across rows. Duplicate rows are available at different parts of the dataset. ##Potential Analysis:

Customer segmentation analysis based on demographics, purchase behavior, and feedback. Sales trend analysis over time to identify peak seasons or trends. Product performance analysis to determine popular categories, brands, or types. Geographic analysis to understand regional preferences and trends. Payment and shipping method analysis to optimize services. Customer satisfaction analysis based on feedback and order status. ##Data Preprocessing:

Handling null values and duplicates. Parsing and formatting temporal data. Encoding categorical variables. Scaling numerical variables if required. Splitting data into training and testing sets for modeling.
Global Country Information Dataset 2023
kaggle.com
zip
Updated Jul 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
Explore at:
zip(24063 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
Nidula Elgiriyewithana ⚡
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.

Data Source: This dataset was compiled from multiple data sources

If this was helpful, a vote is appreciated ❤️ Thank you 🙂
r
1000 Empirical Time series
researchdata.edu.au
bridges.monash.edu
+1more
Updated May 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2022). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.

The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.

The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv.

These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.

The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as
>> TS_Init('INP_Empirical1000.mat');

Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.

See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
Data from: Global Summary of the Year (GSOY), Version 1
catalog.data.gov
s.cnmilf.com
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). Global Summary of the Year (GSOY), Version 1 [Dataset]. https://catalog.data.gov/dataset/global-summary-of-the-year-gsoy-version-12
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
This Global Summaries dataset, known as GSOY for Yearly, contains a yearly resolution of meteorological elements from 1763 to present with updates applied weekly. The major parameters are: â€“ average annual temperature, average annual minimum and maximum temperatures; total annual precipitation and snowfall; departure from normal of the mean temperature and total precipitation; heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; extreme annual minimum and maximum temperatures; number of days with fog; and number of days with thunderstorms. The primary input data source is the Global Historical Climatology Network - Daily (GHCN-Daily) dataset. The Global Summaries datasets also include a monthly resolution of meteorological elements in the GSOM (for Monthly) dataset. See associated resources for more information. These datasets are not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which are not available in GSOM and GSOY. The GSOM and GSOY datasets replace the legacy U.S. COOP Summaries (DSI-3220), and have been expanded to include non-U.S. (global) stations. U.S. COOP Summaries (DSI-3220) only includes National Weather Service (NWS) COOP Published, or "Published in CD", sites.
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Global Summary of the Month, version 1.0
data.cnra.ca.gov
data.wu.ac.at
csv, kmz, pdf
Updated Mar 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2023). Global Summary of the Month, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-month-version-1-0
Explore at:
pdf, csv, kmzAvailable download formats
Dataset updated
Mar 1, 2023
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
The global summaries data set contains a monthly (GSOM) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a yearly (GSOY) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.
h
Data from: imdb
huggingface.co
Updated Aug 3, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2003
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "imdb"

Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
c
Sentiment Analysis Dataset
cubig.ai
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/270/sentiment-analysis-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 20, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.

2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
f
Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiyao Huang; Shuwei Li; Xinyuan Song (2025). Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting [Dataset]. http://doi.org/10.6084/m9.figshare.27103243.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27103243.v1
Dataset updated
May 12, 2025
Dataset provided by
Taylor & Francis
Authors
Peiyao Huang; Shuwei Li; Xinyuan Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.
h
dialogsum
huggingface.co
Updated Jun 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthick Kaliannan Neelamohan (2022). dialogsum [Dataset]. https://huggingface.co/datasets/knkarthick/dialogsum
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2022
Authors
Karthick Kaliannan Neelamohan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for DIALOGSum Corpus

Dataset Description Links

Homepage: https://aclanthology.org/2021.findings-acl.449 Repository: https://github.com/cylnlp/dialogsum Paper: https://aclanthology.org/2021.findings-acl.449 Point of Contact: https://huggingface.co/knkarthick

Dataset Summary

DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 (Plus 100 holdout data for topic generation) dialogues with corresponding… See the full description on the dataset page: https://huggingface.co/datasets/knkarthick/dialogsum.
d
Personal Income Tax Filers, Summary Dataset 1 - Major Items by Liability...
catalog.data.gov
s.cnmilf.com
+1more
Updated Sep 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2025). Personal Income Tax Filers, Summary Dataset 1 - Major Items by Liability Status and Place of Residence: Beginning Tax Year 2015 [Dataset]. https://catalog.data.gov/dataset/personal-income-tax-filers-summary-dataset-1-major-items-by-liability-status-and-place-of-
Explore at:
Dataset updated
Sep 14, 2025
Dataset provided by
data.ny.gov
Description
Beginning with tax year 2015, the Department of Taxation and Finance (hereafter “the Department”) began producing a new annual population data study file to provide more comprehensive statistical information on New York State personal income tax returns. The data are from full‐year resident, nonresident, and part‐year resident returns filed between January 1 and December 31 of the year after the start of the liability period (hereafter referred to as the “processing year”). The four datasets display major income tax components by tax year. This includes the distribution of New York adjusted gross income and tax liability by county or place of residence, as well as the value of deductions, exemptions, taxable income and tax before credits by size of income. In addition, three of the four datasets include all the components of income, the components of deductions, and the addition/subtraction modifications. Caution: The current datasets are based on population data. For tax years prior to 2015, data were based on sample data. Data customers are advised to use caution when drawing conclusions comparing data for tax years prior to 2015 and subsequent tax years. Further details are included in the Overview.
DivStat: A User-Friendly Tool for Single Nucleotide Polymorphism Analysis of...
plos.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inês Soares; Ana Moleirinho; Gonçalo N. P. Oliveira; António Amorim (2023). DivStat: A User-Friendly Tool for Single Nucleotide Polymorphism Analysis of Genomic Diversity [Dataset]. http://doi.org/10.1371/journal.pone.0119851
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0119851
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Inês Soares; Ana Moleirinho; Gonçalo N. P. Oliveira; António Amorim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs). Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.
r
Principal components at work: the empirical analysis of monetary policy with...
resodate.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlo A. Favero (2025). Principal components at work: the empirical analysis of monetary policy with large data sets (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9wcmluY2lwYWwtY29tcG9uZW50cy1hdC13b3JrLXRoZS1lbXBpcmljYWwtYW5hbHlzaXMtb2YtbW9uZXRhcnktcG9saWN5LXdpdGgtbGFyZ2UtZGF0YS1zZXRz
Explore at:
Dataset updated
Oct 2, 2025
Dataset provided by
Journal of Applied Econometrics
ZBW
ZBW Journal Data Archive
Authors
Carlo A. Favero
Description
The empirical analysis of monetary policy requires the construction of instruments for future expected inflation. Dynamic factor models have been applied rather successfully to inflation forecasting. In fact, two competing methods have recently been developed to estimate large-scale dynamic factor models based, respectively, on static and dynamic principal components. This paper combines the econometric literature on dynamic principal components and the empirical analysis of monetary policy. We assess the two competing methods for extracting factors on the basis of their success in instrumenting future expected inflation in the empirical analysis of monetary policy. We use two large data sets of macroeconomic variables for the USA and for the Euro area. Our results show that estimated factors do provide a useful parsimonious summary of the information used in designing monetary policy.
A
AI Training Dataset In Healthcare Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
Z
Data from: Large Landing Trajectory Data Set for Go-Around Analysis
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7148116
Explore at:
Dataset updated
Dec 16, 2022
Dataset provided by
ZHAW
Authors
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight registration string Aircraft registration typecode string Aircraft ICAO typecode icaoaircrafttype string ICAO aircraft type wtc string ICAO wake turbulence category glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometre airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) operator_country string ISO Alpha-3 country code of the operator operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) wind_speed_knts integer METAR, surface wind speed in knots wind_dir_deg integer METAR, surface wind direction in degrees wind_gust_knts integer METAR, surface wind gust speed in knots visibility_m float METAR, visibility in m temperature_deg integer METAR, temperature in degrees Celsius press_sea_level_p float METAR, sea level pressure in hPa press_p float METAR, QNH in hPA weather_intensity list METAR, list of present weather codes: qualifier - intensity weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation weather_desc list METAR, list of present weather codes: qualifier - descriptor weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed n_landings integer Total number of landings observed on this runway in 2019 ga_rate float Go-around rate, per 1000 landings glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometres airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

iterate over flights and pull the data from OpenSky Network

flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

# fetch the data from OpenSky Network flights.append( opensky.history( start=start_time.strftime("%Y-%m-%d %H:%M:%S"), stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"), callsign=row["callsign"], return_flight=True, ) )

The flights can be converted into a Traffic object

Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Major Diagnostic Categories Summary
data.chhs.ca.gov
data.ca.gov
+2more
csv, docx, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Major Diagnostic Categories Summary [Dataset]. https://data.chhs.ca.gov/dataset/major-diagnostic-categories-summary
Explore at:
csv, docx, zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
This dataset provides the adjusted length of stay, type of care, discharges with valid charges, charges by hospital, licensure of bed, and Major Diagnostic Category (MDC).
Commercial Reference Building: Large Hotel
data.openei.org
s.cnmilf.com
+2more
data, image_document +1
Updated Nov 25, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Deru; Kristin Field; Daniel Studer; Kyle Benne; Brent Griffith; Paul Torcellini; Bing Liu; Mark Halverson; Dave Winiarski; Michael Rosenberg; Mehry Yazdanian; Joe Huang; Drury Crawley; Michael Deru; Kristin Field; Daniel Studer; Kyle Benne; Brent Griffith; Paul Torcellini; Bing Liu; Mark Halverson; Dave Winiarski; Michael Rosenberg; Mehry Yazdanian; Joe Huang; Drury Crawley (2014). Commercial Reference Building: Large Hotel [Dataset]. https://data.openei.org/submissions/158
Explore at:
website, data, image_documentAvailable download formats
Dataset updated
Nov 25, 2014
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
National Renewable Energy Laboratory
Open Energy Data Initiative (OEDI)
Authors
Michael Deru; Kristin Field; Daniel Studer; Kyle Benne; Brent Griffith; Paul Torcellini; Bing Liu; Mark Halverson; Dave Winiarski; Michael Rosenberg; Mehry Yazdanian; Joe Huang; Drury Crawley; Michael Deru; Kristin Field; Daniel Studer; Kyle Benne; Brent Griffith; Paul Torcellini; Bing Liu; Mark Halverson; Dave Winiarski; Michael Rosenberg; Mehry Yazdanian; Joe Huang; Drury Crawley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Commercial reference buildings provide complete descriptions for whole building energy analysis using EnergyPlus (see "About EnergyPlus" resource link) simulation software. Included here is data pertaining to the reference building type "Large Hotel" for each of the 16 climate zones described on the Wiki page (see "OpenEI Wiki Page for Commercial Reference Buildings" resource link), and each of three construction categories: new (2004) construction, post-1980 construction existing buildings, and pre-1980 construction existing buildings.

The dataset includes four key components: building summary, zone summary, location summary and a picture. Building summary includes details about: form, fabric, and HVAC. Zone summary includes details such as: area, volume, lighting, and occupants for all types of zones in the building. Location summary includes key building information as it pertains to each climate zone, including: fabric and HVAC details, utility costs, energy end use, and peak energy demand.

In total, DOE developed 16 reference building types that represent approximately 70% of commercial buildings in the U.S.; for each type, building models are available for each of the three construction categories. The commercial reference buildings (formerly known as commercial building benchmark models) were developed by the U.S. Department of Energy (DOE), in conjunction with three of its national laboratories.

Additional data is available directly from DOE's Energy Efficiency & Renewable Energy (EERE) website (see "About Commercial Buildings" resource link), including EnergyPlus software input files (.idf) and results of the EnergyPlus simulations (.html).

Note: There have been many changes and improvements since this dataset was released. Several revisions have been made to the models and moved to a different approach to representing typical building energy consumption. For current data on building energy consumption please see the ComStock resource below.
Dialog Summarization
kaggle.com
zip
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marawan Mamdouh (2023). Dialog Summarization [Dataset]. https://www.kaggle.com/datasets/marawanxmamdouh/dialogsum/code
Explore at:
zip(8439988 bytes)Available download formats
Dataset updated
Aug 23, 2023
Authors
Marawan Mamdouh
Description
The "DialogSum Corpus" is a comprehensive dataset designed for dialogue summarization and topic generation research. It is organized into two distinct folders, one containing CSV files and the other containing the same data as JSONL files.

Dataset Summary

DialogSum Corpus serves as an extensive repository for dialogue summarization research. Each entry in this dataset offers insights into a wide range of conversational scenarios, capturing interactions among individuals engaged in various everyday life discussions. The dialogues encompass a diverse spectrum of topics, covering areas such as schooling, work, medication, shopping, leisure, travel, and more. These conversations unfold in different real-life settings, featuring exchanges between friends, colleagues, customers, and service providers.

Languages

The dataset is exclusively presented in the English language.

Dataset Structure

DialogSum Corpus is thoughtfully organized into distinct data instances across the CSV and JSONL formats. It comprises a total of 12,960 dialogues, including an additional 1,500 dialogues specifically allocated for testing purposes. The dataset is categorized into conventional train, test, and validation subsets, ensuring a well-balanced distribution for effective model assessment. A representative example from the training set is illustrated below:

The dataset encompasses three core fields:

id: A unique identifier assigned to each dialogue instance. (Named fname in JSONL files).

dialogue: The transcribed text of the dialogue itself.

summary: A human-authored concise summary encapsulating the key aspects of the dialogue.

topic: A succinct one-liner capturing the central theme of the dialogue.

Data Splits

The DialogSum Corpus dataset is divided into distinct subsets as follows:

Train: 12,460 dialogues

Validation: 500 dialogues

Test: 1,500 dialogues

hiddentest_dialogue: 100 dialogues (featuring only 'id' and 'dialogue'' fields)

hiddentest_topic: 100 (featuring only 'id' and 'topic' fields)

Dataset Creation

The creation of the DialogSum Corpus involved a meticulous curation process. Dialogue data were sourced from publicly available dialogue corpora, including Dailydialog, DREAM, MuTual, and an English speaking practice website. Annotators, who are language experts, were tasked with summarizing each dialogue based on specific criteria. These criteria encompass conveying salient information, ensuring brevity, preserving important named entities, and adhering to a formal language style.

Licensing Information

The dataset is released under the MIT License, enabling its utilization across a broad spectrum of applications.

Citation Information

For proper citation of this dataset in academic and research contexts, the following reference can be used:

@inproceedings{chen-etal-2021-dialogsum, title = "{D}ialog{S}um: {A} Real-Life Scenario Dialogue Summarization Dataset", author = "Chen, Yulong and Liu, Yang and Chen, Liang and Zhang, Yue", booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-acl.449", doi = "10.18653/v1/2021.findings-acl.449", pages = "5062--5074" }

The DialogSum Corpus serves as a valuable resource for researchers and practitioners engaged in dialogue summarization and topic generation, providing a diverse collection of real-life conversational data to explore and advance the field.
u
CPC Global Summary of Day/Month Observations
data.ucar.edu
rda-web-prod.ucar.edu
+3more
ascii
Updated Oct 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Prediction Center, National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce (2025). CPC Global Summary of Day/Month Observations [Dataset]. http://doi.org/10.5065/FX9Q-0V31
Explore at:
asciiAvailable download formats
Unique identifier
https://doi.org/10.5065/FX9Q-0V31
Dataset updated
Oct 9, 2025
Dataset provided by
NSF National Center for Atmospheric Research
Authors
Climate Prediction Center, National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce
Time period covered
Jan 1, 1987 - Mar 31, 2012
Description
This global summary of the day and month data set is obtained on a delayed monthly basis from the Climate Prediction Center (CPC) of the National Centers for Environmental Prediction (NCEP). CPC extracts surface synoptic weather observations from the Global Telecommunications System (GTS) and performs limited automated validation of the parameters. The data is then summarized for all reporting stations on a daily basis to current operational requirements related to the assessment of crop and energy production.
Data coverage begins in 1979. In 1987 there is a format change and additional parameters were added. Major parameters include maximum temperature, minimum temperature, precipitation, vapor pressure, sea level pressure, maximum relative humidity, and minimum relative humidity. If the maximum or minimum temperatures are not reported, they are estimated from reported air temperatures in the regular synoptic reports when sufficient data exist. Starting in 1994, total sky cover, 3-hourly wind direction and speed, and total snow depth are included. There are approximately 8900 actively reporting stations. Periods of record vary widely among the stations.

CAUTIONARY NOTE: NCEP incorrectly decoded the wind units indicator from February 1, 2001 until 1500 UTC on June 11, 2002, which caused a knots versus meters-per-second problem. Not all stations were affected. Users may, with caution, apply the knots or meters-per-second conversion where it appears to be the correct choice.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sahil Prajapati (2024). Retail Analysis on Large Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693643

Retail Analysis on Large Dataset

In this dataset i founded so many insights also in last i developed Recom.. Sys.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/8693643

Dataset updated

Jun 14, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sahil Prajapati

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Description:

The dataset represents retail transactional data. It contains information about customers, their purchases, products, and transaction details. The data includes various attributes such as customer ID, name, email, phone, address, city, state, zipcode, country, age, gender, income, customer segment, last purchase date, total purchases, amount spent, product category, product brand, product type, feedback, shipping method, payment method, and order status.

Key Points:

Customer Information:

Includes customer details like ID, name, email, phone, address, city, state, zipcode, country, age, and gender. Customer segments are categorized into Premium, Regular, and New. ##Transaction Details:
Transaction-specific data such as transaction ID, last purchase date, total purchases, amount spent, total purchase amount, feedback, shipping method, payment method, and order status. ##Product Information:
Contains product-related details such as product category, brand, and type. Products are categorized into electronics, clothing, grocery, books, and home decor. ##Geographic Information:
Contains location details including city, state, and country. Available for various countries including USA, UK, Canada, Australia, and Germany. ##Temporal Information:
Last purchase date is provided along with separate columns for year, month, date, and time. Allows analysis based on temporal patterns and trends. ##Data Quality:
Some rows contain null values, and others are duplicates, which may need to be handled during data preprocessing. Null values are randomly distributed across rows. Duplicate rows are available at different parts of the dataset. ##Potential Analysis:
Customer segmentation analysis based on demographics, purchase behavior, and feedback. Sales trend analysis over time to identify peak seasons or trends. Product performance analysis to determine popular categories, brands, or types. Geographic analysis to understand regional preferences and trends. Payment and shipping method analysis to optimize services. Customer satisfaction analysis based on feedback and order status. ##Data Preprocessing:
Handling null values and duplicates. Parsing and formatting temporal data. Encoding categorical variables. Scaling numerical variables if required. Splitting data into training and testing sets for modeling.

Clear search

Close search

Google apps

Main menu

Retail Analysis on Large Dataset

Dataset Description:

Key Points:

Customer Information:

Global Country Information Dataset 2023

Description

Key Features

Potential Use Cases

1000 Empirical Time series

Data from: Global Summary of the Year (GSOY), Version 1

COVID-19 Combined Data-set with Improved Measurement Errors

Global Summary of the Month, version 1.0

Data from: imdb

Sentiment Analysis Dataset

Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...

dialogsum

Personal Income Tax Filers, Summary Dataset 1 - Major Items by Liability...

DivStat: A User-Friendly Tool for Single Nucleotide Polymorphism Analysis of...

Principal components at work: the empirical analysis of monetary policy with...

AI Training Dataset In Healthcare Market Report

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

iterate over flights and pull the data from OpenSky Network

The flights can be converted into a Traffic object

Major Diagnostic Categories Summary

Commercial Reference Building: Large Hotel

Dialog Summarization

Dataset Summary

Languages

Dataset Structure

Data Splits

Dataset Creation

Licensing Information

Citation Information

CPC Global Summary of Day/Month Observations

Dataset of development of business during the COVID-19 crisis

Retail Analysis on Large Dataset

In this dataset i founded so many insights also in last i developed Recom.. Sys.

Dataset Description:

Key Points:

Customer Information: