35 datasets found
  1. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  2. Pivot table - Data analysis project

    • kaggle.com
    Updated Jul 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gamal Khattab (2022). Pivot table - Data analysis project [Dataset]. https://www.kaggle.com/datasets/gamalkhattab/pivot-table
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gamal Khattab
    Description

    Summarize big data with pivot table and charts and slicers

  3. Cleaned-Data Pakistan's Largest Ecommerce Dataset

    • kaggle.com
    Updated Mar 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    umaraziz97 (2023). Cleaned-Data Pakistan's Largest Ecommerce Dataset [Dataset]. https://www.kaggle.com/datasets/umaraziz97/cleaned-data-pakistans-largest-ecommerce-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    umaraziz97
    Area covered
    Pakistan
    Description

    Pakistan’s largest ecommerce data – Power BI Report

    Dataset Link: pakistan’s_largest_ecommerce_dataset Cleaned Data: Cleaned_Pakistan’s_largest_ecommerce_dataset

    Raw Data:

    Rows: 584525 **Columns: **21

    Process:

    All the raw data transformed and saved in new Excel file Working – Pakistan Largest Ecommerce Dataset

    Processed Data:

    Rows: 582250 Columns: 22 Visualization: Here is the link of Visualization report link: Pakistan-s-largest-ecommerce-data-Power-BI-Data-Visualization-Report

    Conclusion:

    In categories Mobiles & Tables make more money by selling highest no of products and also providing highest amount of discount on products. On the other side Men’s Fashion Category has sell second highest no of products but it can’t generate money with that ratio, may be the prices of individual products is a good reason behind that. And in orders details we experience Mobiles & Tablets have highest no of canceled orders but completed orders are almost same as Men’s Fashion. We have mostly completed orders but have huge no of canceled orders. In payment methods cod has most no of completed order and mostly canceled orders have payment method Easyaxis.

  4. Excel dataset

    • kaggle.com
    zip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
    Explore at:
    zip(13123 bytes)Available download formats
    Dataset updated
    Jun 29, 2023
    Authors
    Pinky Verma
    Description

    Dataset

    This dataset was created by Pinky Verma

    Contents

  5. w

    Data from: New Data Reduction Tools and their Application to The Geysers...

    • data.wu.ac.at
    pdf
    Updated Dec 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). New Data Reduction Tools and their Application to The Geysers Geothermal Field [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/NjNiMTc2MzQtOWQ5Mi00MjE5LWEwOWQtZDFjMmE5YjcwZWM0
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 5, 2017
    Area covered
    The Geysers, 3296a5bce23af293dbb49a144dfc986f894e7756
    Description

    Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.

  6. a

    How Python Can Work For You

    • code-deegsnccu.hub.arcgis.com
    • cope-open-data-deegsnccu.hub.arcgis.com
    • +1more
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    East Carolina University (2023). How Python Can Work For You [Dataset]. https://code-deegsnccu.hub.arcgis.com/items/6d5c27fa87564d52b0b753d4a3168ef1
    Explore at:
    Dataset updated
    Aug 26, 2023
    Dataset authored and provided by
    East Carolina University
    Description

    Python is a free computer language that prioritizes readability for humans and general application. It is one of the easier computer languages to learn and start especially with no prior programming knowledge. I have been using Python for Excel spreadsheet automation, data analysis, and data visualization. It has allowed me to better focus on learning how to automate my data analysis workload. I am currently examining the North Carolina Department of Environmental Quality (NCDEQ) database for water quality sampling for the Town of Nags Head, NC. It spans over 26 years (1997-2023) and lists a total of currently 41 different testing site locations. You can see at the bottom of image 2 below that I have 148,204 testing data points for the entirety of the NCDEQ testing for the state. From this large dataset 34,759 data points are from Dare County (Nags Head) specifically with this subdivided into testing sites.

  7. m

    Data for "Direct and indirect Rod and Frame effect: A virtual reality study"...

    • data.mendeley.com
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Adamski (2025). Data for "Direct and indirect Rod and Frame effect: A virtual reality study" [Dataset]. http://doi.org/10.17632/pcf2n8b4rd.1
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Michał Adamski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the raw experimental data and supplementary materials for the "Asymmetry Effects in Virtual Reality Rod and Frame Test". The materials included are:

    •  Raw Experimental Data: older.csv and young.csv
    •  Mathematica Notebooks: a collection of Mathematica notebooks used for data analysis and visualization. These notebooks provide scripts for processing the experimental data, performing statistical analyses, and generating the figures used in the project.
    •  Unity Package: a Unity package featuring a sample scene related to the project. The scene was built using Unity’s Universal Rendering Pipeline (URP). To utilize this package, ensure that URP is enabled in your Unity project. Instructions for enabling URP can be found in the Unity URP Documentation.
    

    Requirements:

    •  For Data Files: software capable of opening CSV files (e.g., Microsoft Excel, Google Sheets, or any programming language that can read CSV formats).
    •  For Mathematica Notebooks: Wolfram Mathematica software to run and modify the notebooks.
    •  For Unity Package: Unity Editor version compatible with URP (2019.3 or later recommended). URP must be installed and enabled in your Unity project.
    

    Usage Notes:

    •  The dataset facilitates comparative studies between different age groups based on the collected variables.
    •  Users can modify the Mathematica notebooks to perform additional analyses.
    •  The Unity scene serves as a reference to the project setup and can be expanded or integrated into larger projects.
    

    Citation: Please cite this dataset when using it in your research or publications.

  8. Data from: Soil Water Content Data for The Bushland, Texas, Winter Wheat...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Soil Water Content Data for The Bushland, Texas, Winter Wheat Experiments [Dataset]. https://catalog.data.gov/dataset/soil-water-content-data-for-the-bushland-texas-winter-wheat-experiments-bf85a
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Bushland, Texas
    Description

    [NOTE - 2022-09-07: this dataset is superseded by an updated version https://doi.org/10.15482/USDA.ADC/1526332 ] This dataset contains soil water content data developed from neutron probe readings taken in access tubes in two of the four large, precision weighing lysimeters and in the fields surrounding each lysimeter that were planted to winter wheat at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL) beginning in 1989. Data in each spreadsheet are for one winter wheat growing season, either 1989-1990, 1991-1992, or 1992-1993. Other readings taken in those years for other crops are reported elsewhere. Data for the 1989-1990 season and the 1992-1993 season are from the northwest (NW) and southwest (SW) weighing lysimeters and surrounding fields. Data for the 1991-1992 season are from the northeast (NE) and southeast (SE) weighing lysimeters and surrounding fields. Readings were taken periodically with a field-calibrated neutron probe at depths from 10 cm to 230 cm (maximum of 190 cm depth in the lysimeters) in 20-cm depth increments. Periods between readings were typically one to two weeks, sometimes longer according to experimental design and need for data. Field calibrations in the Pullman soil series were done every few years. Calibrations typically produced a regression equation with RMSE <= 0.01 m3 m-3 (e.g., Evett and Steiner, 1995). Data were used to guide irrigation scheduling to achieve full or deficit irrigation as required by the experimental design. Data may be used to calculate the soil profile water content in mm of water from the surface to the maximum depth of reading. Profile water content differences between reading times in the same access tube are considered the change in soil water storage during the period in question and may be used to compute evapotranspiration (ET) using the soil water balance equation: ET = (change in storage + P + I + F + R, where P is precipitation during the period, I is irrigation during the period, F is soil water flux (drainage) out of the bottom of the soil profile during the period, and R is the sum of runon and runoff during the period. Typically, R is taken as zero because the fields were furrow diked to prevent runon and runoff during most of each growing season. Resources in this dataset:Resource Title: 1989-90 Bushland, TX, west winter wheat volumetric soil water content data. File Name: 1989-90_West_Winter-Wheat_Soil-water.xlsxResource Description: Contains periodic volumetric soil water content data from neutron probe readings in 20-cm depth increments from 10-cm depth to 230-cm depth in access tubes in fields around the Bushland, TX, northwest (NW) and southwest (SW) large, precision weighing lysimeters, and to 190-cm depth in each lysimeter. The excel file contains a data dictionary for each tab containing data. There is also a tab named Introduction that lists the authors, equipment used, relevant citations, and explains the other tabs, which contain either data dictionaries, data, geographical coordinates of access tube locations, or data visualization tools. Tab names are unique so that tabs may be saved as individual CSV files.Resource Title: 1991-92 Bushland, TX, east winter wheat volumetric soil water content data. File Name: 1991-92_East_Winter-Wheat_Soil-water.xlsxResource Description: Contains periodic volumetric soil water content data from neutron probe readings in 20-cm depth increments from 10-cm depth to 230-cm depth in access tubes in fields around the Bushland, TX, large, northeast (NE) and southeast (SE) precision weighing lysimeters, and to 190-cm depth in each lysimeter. The excel file contains a data dictionary for each tab containing data. There is also a tab named Introduction that lists the authors, equipment used, relevant citations, and explains the other tabs, which contain either data dictionaries, data, geographical coordinates of access tube locations, or data visualization tools. Tab names are unique so that tabs may be saved as individual CSV files.Resource Title: 1992-93 Bushland, TX, west winter wheat volumetric soil water content data. File Name: 1992-93_West_Winter-Wheat_Soil-water.xlsxResource Description: Contains periodic volumetric soil water content data from neutron probe readings in 20-cm depth increments from 10-cm depth to 230-cm depth in access tubes in fields around the Bushland, TX, northwest (NW) and southwest (SW) large, precision weighing lysimeters, and to 190-cm depth in each lysimeter. The excel file contains a data dictionary for each tab containing data. There is also a tab named Introduction that lists the authors, equipment used, relevant citations, and explains the other tabs, which contain either data dictionaries, data, geographical coordinates of access tube locations, or data visualization tools. Tab names are unique so that tabs may be saved as individual CSV files.

  9. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  10. c

    The global GPU Database market size is USD 455 million in 2024 and will...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research, The global GPU Database market size is USD 455 million in 2024 and will expand at a compound annual growth rate (CAGR) of 20.7% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/gpu-database-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global GPU Database market size was USD 455 million in 2024 and will expand at a compound annual growth rate (CAGR) of 20.7% from 2024 to 2031. Market Dynamics of GPU Database Market Key Drivers for GPU Database Market Growing Demand for High-Performance Computing in Various Data-Intensive Industries- One of the main reasons the GPU Database market is growing demand for high-performance computing (HPC) across various data-intensive industries. These industries, including finance, healthcare, and telecommunications, require rapid data processing and real-time analytics, which GPU databases excel at providing. Unlike traditional CPU databases, GPU databases leverage the parallel processing power of GPUs to handle complex queries and large datasets more efficiently. This capability is crucial for applications such as machine learning, artificial intelligence, and big data analytics. The expansion of data and the increasing need for speed and scalability in processing are pushing enterprises to adopt GPU databases. Consequently, the market is poised for robust growth as organizations continue to seek solutions that offer enhanced performance, reduced latency, and greater computational power to meet their evolving data management needs. The increasing demand for gaining insights from large volumes of data generated across verticals to drive the GPU Database market's expansion in the years ahead. Key Restraints for GPU Database Market Lack of efficient training professionals poses a serious threat to the GPU Database industry. The market also faces significant difficulties related to insufficient security options. Introduction of the GPU Database Market The GPU database market is experiencing rapid growth due to the increasing demand for high-performance data processing and analytics. GPUs, or Graphics Processing Units, excel in parallel processing, making them ideal for handling large-scale, complex data sets with unprecedented speed and efficiency. This market is driven by the proliferation of big data, advancements in AI and machine learning, and the need for real-time analytics across industries such as finance, healthcare, and retail. Companies are increasingly adopting GPU-accelerated databases to enhance data visualization, predictive analytics, and computational workloads. Key players in this market include established tech giants and specialized startups, all contributing to a competitive landscape marked by innovation and strategic partnerships. As organizations continue to seek faster and more efficient ways to harness their data, the GPU database market is poised for substantial growth, reshaping the future of data management and analytics.< /p>

  11. Describing the performance of U.S. hospitals by applying big data analytics

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas S. Downing; Alexander Cloninger; Arjun K. Venkatesh; Angela Hsieh; Elizabeth E. Drye; Ronald R. Coifman; Harlan M. Krumholz (2023). Describing the performance of U.S. hospitals by applying big data analytics [Dataset]. http://doi.org/10.1371/journal.pone.0179603
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicholas S. Downing; Alexander Cloninger; Arjun K. Venkatesh; Angela Hsieh; Elizabeth E. Drye; Ronald R. Coifman; Harlan M. Krumholz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public reporting of measures of hospital performance is an important component of quality improvement efforts in many countries. However, it can be challenging to provide an overall characterization of hospital performance because there are many measures of quality. In the United States, the Centers for Medicare and Medicaid Services reports over 100 measures that describe various domains of hospital quality, such as outcomes, the patient experience and whether established processes of care are followed. Although individual quality measures provide important insight, it is challenging to understand hospital performance as characterized by multiple quality measures. Accordingly, we developed a novel approach for characterizing hospital performance that highlights the similarities and differences between hospitals and identifies common patterns of hospital performance. Specifically, we built a semi-supervised machine learning algorithm and applied it to the publicly-available quality measures for 1,614 U.S. hospitals to graphically and quantitatively characterize hospital performance. In the resulting visualization, the varying density of hospitals demonstrates that there are key clusters of hospitals that share specific performance profiles, while there are other performance profiles that are rare. Several popular hospital rating systems aggregate some of the quality measures included in our study to produce a composite score; however, hospitals that were top-ranked by such systems were scattered across our visualization, indicating that these top-ranked hospitals actually excel in many different ways. Our application of a novel graph analytics method to data describing U.S. hospitals revealed nuanced differences in performance that are obscured in existing hospital rating systems.

  12. B

    Data from: A Worldwide Historical Dam Failure's Database

    • borealisdata.ca
    • search.dataone.org
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayari Bernard-Garcia; Tew-Fik Mahdi (2024). A Worldwide Historical Dam Failure's Database [Dataset]. http://doi.org/10.5683/SP2/E7Z09B
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Borealis
    Authors
    Mayari Bernard-Garcia; Tew-Fik Mahdi
    License

    https://borealisdata.ca/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.5683/SP2/E7Z09Bhttps://borealisdata.ca/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.5683/SP2/E7Z09B

    Description

    Assembled from 196 references, this database records a total of 3,861 cases of historical dam failures around the world and represents the largest compilation of dam failures recorded to date (17-02-2020). Indeed, in this database is recorded historical dam failure regardless of the type of dams (e.g. man-made dam, tailing dam, temporary dam, natural dam, etc.), either the type of structure (e.g. concrete dam, embankment dam, etc.), the type of failure (e.g. pipping failure, overtopping failure, etc.) or the properties of the dams (e.g. dam height, reservoir capacity, etc.). Through this process, a total of 45 variables (i.e. which composed the “dataset”, obtained) have been used (when possible/available and relevant) to record various information about the failure (e.g. dam descriptions, dam properties, breach dimensions, etc.). Coupled with the Excel’s functionalities (e.g. adapted from Excel 2016; customizable screen visualization, individual search of specific cases, data filter, pivot table, etc.), the database file can easily be adapted to the needs of the user (i.e. research field, dam type, dam failure type, etc.) and is considered as a door opening in various fields of research (e.g. such as hydrology, hydraulics and dam safety). Also, notice that the dataset proposed allows any user to optimize the verification process, to identify duplicates and to put back in context the historical dam failures recorded. Overall, this investigation work has aimed to standardize data collection of historical dam failures and to facilitate the international collection by setting guidelines. Indeed, the sharing method (i.e. provided through this link) not only represents a considerable asset for a wide audience (e.g. researchers, dams’ owner, etc.) but, furthermore, allows paving the way for the field of dam safety in the actual era of "Big Data". Updated versions will be deposited (at this DOI) at undetermined frequencies in order to update the data recorded over the years. Cette base de données, compile un total de 3 861 cas de rupture de barrages à travers le monde, soit la plus large compilation de ruptures historiques de barrages actuellement disponible dans la littérature (17-02-2020), et a été obtenue suite à la revue de 196 références. Pour ce faire, les cas de ruptures de barrages historiques recensés ont été enregistrés dans le fichier XLSX fourni, et ce, indépendamment du domaine d’application (ex. barrage construit par l’Homme, barrage à rétention minier, barrage temporaire, barrage naturel, etc.), du type d’ouvrage (ex. barrage en béton, barrage en remblai, etc.), du mode de rupture (ex. rupture par effet de Renard, rupture par submersion, etc.) et des propriétés des ouvrages (ex. hauteur du barrage, capacité du réservoir, etc.). Au fil du processus de compilation, un jeu de 45 variables a été obtenu afin d’enregistrer les informations (lorsque possible/disponible et pertinente) décrivant les données recensées dans la littérature (ex. descriptions du barrage, propriétés du barrage, dimensions de la brèche de rupture, etc.). De ce fait, le travail d’investigation et de compilation, ayant permis d’uniformiser et de standardiser cette collecte de données de différents types de barrages, a ainsi permis de fournir des balises facilitant la collecte de données à l’échelle internationale. Soulignons qu’en couplant la base de données aux fonctionnalités d'Excel (ex. pour Excel 2016: visualisation d'écran personnalisable, recherche individuelle de cas spécifiques, filtre de données, tableau croisé dynamique, etc.), le fichier peut également aisément être adapter aux besoins de son utilisateur (ex. domaine d’étude, type de barrage, type de rupture de barrage, etc.), ouvrant ainsi la porte à de nouvelles études dans divers domaines de recherche (ex. domaine de l’hydrologie, l’hydraulique et de la sécurité des barrages), grâce aux données nouvellement compilées. De ce fait, cette méthode de partage, mise gratuitement à la disposition de la communauté internationale par l’entremise de cette page web, représente donc non seulement un atout considérable pour un large public (ex. chercheurs, propriétaires de barrages, etc.), mais permet au domaine de la sécurité des barrages d’entrer dans l'actuelle ère du « Big Data ». Des versions mises à jour seront par le fait même déposées (via ce DOI) à des fréquences indéterminées afin de mettre à jour les données enregistrées au fil des ans.

  13. S

    A dataset of knowledge graph construction for patents, sci-tech achievements...

    • scidb.cn
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hu hui ling; Zhai Jun; Li Mei; Li Xin; Shen Lixin (2025). A dataset of knowledge graph construction for patents, sci-tech achievements and papers in agriculture, industry and service industry [Dataset]. http://doi.org/10.57760/sciencedb.j00001.01576
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    Science Data Bank
    Authors
    hu hui ling; Zhai Jun; Li Mei; Li Xin; Shen Lixin
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    As important carriers of innovation activities, patents, sci-tech achievements and papers play an increasingly prominent role in national political and economic development under the background of a new round of technological revolution and industrial transformation. However, in a distributed and heterogeneous environment, the integration and systematic description of patents, sci-tech achievements and papers data are still insufficient, which limits the in-depth analysis and utilization of related data resources. The dataset of knowledge graph construction for patents, sci-tech achievements and papers is an important means to promote innovation network research, and is of great significance for strengthening the development, utilization, and knowledge mining of innovation data. This work collected data on patents, sci-tech achievements and papers from China's authoritative websites spanning the three major industries—agriculture, industry, and services—during the period 2022-2025. After processes of cleaning, organizing, and normalization, a patents-sci-tech achievements-papers knowledge graph dataset was formed, containing 10 entity types and 8 types of entity relationships. To ensure quality and accuracy of data, the entire process involved strict preprocessing, semantic extraction and verification, with the ontology model introduced as the schema layer of the knowledge graph. The dataset establishes direct correlations among patents, sci-tech achievements and papers through inventors/contributors/authors, and utilizes the Neo4j graph database for storage and visualization. The open dataset constructed in this study can serve as important foundational data for building knowledge graphs in the field of innovation, providing structured data support for innovation activity analysis, scientific research collaboration network analysis and knowledge discovery.The dataset consists of two parts. The first part includes three Excel tables: 1,794 patent records with 10 fields, 181 paper records with 7 fields, and 1,156 scientific and technological achievement records with 11 fields. The second part is a knowledge graph dataset in CSV format that can be imported into Neo4j, comprising 10 entity files and 8 relationship files.

  14. Netflix Data: Cleaning, Analysis and Visualization

    • kaggle.com
    zip
    Updated Aug 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrasaq Ariyo (2022). Netflix Data: Cleaning, Analysis and Visualization [Dataset]. https://www.kaggle.com/datasets/ariyoomotade/netflix-data-cleaning-analysis-and-visualization
    Explore at:
    zip(276607 bytes)Available download formats
    Dataset updated
    Aug 26, 2022
    Authors
    Abdulrasaq Ariyo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .

    Data Cleaning

    We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments

    --View dataset
    
    SELECT * 
    FROM netflix;
    
    
    --The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
                                      
    SELECT show_id, COUNT(*)                                                                                      
    FROM netflix 
    GROUP BY show_id                                                                                              
    ORDER BY show_id DESC;
    
    --No duplicates
    
    --Check null values across columns
    
    SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
        COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
        COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
        COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
        COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
        COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
        COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
        COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
        COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
        COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
        COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
        COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
    FROM netflix;
    
    We can see that there are NULLS. 
    director_nulls = 2634
    movie_cast_nulls = 825
    country_nulls = 831
    date_added_nulls = 10
    rating_nulls = 4
    duration_nulls = 3 
    

    The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column

    -- Below, we find out if some directors are likely to work with particular cast
    
    WITH cte AS
    (
    SELECT title, CONCAT(director, '---', movie_cast) AS director_cast 
    FROM netflix
    )
    
    SELECT director_cast, COUNT(*) AS count
    FROM cte
    GROUP BY director_cast
    HAVING COUNT(*) > 1
    ORDER BY COUNT(*) DESC;
    
    With this, we can now populate NULL rows in directors 
    using their record with movie_cast 
    
    UPDATE netflix 
    SET director = 'Alastair Fothergill'
    WHERE movie_cast = 'David Attenborough'
    AND director IS NULL ;
    
    --Repeat this step to populate the rest of the director nulls
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET director = 'Not Given'
    WHERE director IS NULL;
    
    --When I was doing this, I found a less complex and faster way to populate a column which I will use next
    

    Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column

    --Populate the country using the director column
    
    SELECT COALESCE(nt.country,nt2.country) 
    FROM netflix AS nt
    JOIN netflix AS nt2 
    ON nt.director = nt2.director 
    AND nt.show_id <> nt2.show_id
    WHERE nt.country IS NULL;
    UPDATE netflix
    SET country = nt2.country
    FROM netflix AS nt2
    WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id 
    AND netflix.country IS NULL;
    
    
    --To confirm if there are still directors linked to country that refuse to update
    
    SELECT director, country, date_added
    FROM netflix
    WHERE country IS NULL;
    
    --Populate the rest of the NULL in director as "Not Given"
    
    UPDATE netflix 
    SET country = 'Not Given'
    WHERE country IS NULL;
    

    The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization

    --Show date_added nulls
    
    SELECT show_id, date_added
    FROM netflix_clean
    WHERE date_added IS NULL;
    
    --DELETE nulls
    
    DELETE F...
    
  15. Real Estate Data

    • kaggle.com
    zip
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AgarwalYashhh (2024). Real Estate Data [Dataset]. https://www.kaggle.com/datasets/agarwalyashhh/gurgaon-real-estate-data/code
    Explore at:
    zip(1245852 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    AgarwalYashhh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Datasets contains 4 files- the excel file is the original file after scraping the data from the website but is very raw and uncleaned. After spending a lot of time, I tried to clean the data, which I thought fits best to represent the dataset and can be used for projects. Explore all the datasets and share your notebooks and insights! Consider upvoting if you find it helpful, Thank you.

  16. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  17. Retail data analysis project (excel)

    • kaggle.com
    zip
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soe Yan Naung (2024). Retail data analysis project (excel) [Dataset]. https://www.kaggle.com/datasets/ericyang19/retail-data-analysis-project-excel
    Explore at:
    zip(4306415 bytes)Available download formats
    Dataset updated
    Dec 9, 2024
    Authors
    Soe Yan Naung
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    In this project, I conducted a comprehensive analysis of retail and warehouse sales data to derive actionable insights. The primary objective was to understand sales trends, evaluate performance across channels, and identify key contributors to overall business success.

    To achieve this, I transformed raw data into interactive Excel dashboards that highlight sales performance and channel contributions, providing a clear and concise representation of business metrics.

    Key Highlights of the Project:

    Created two dashboards: Sales Dashboard and Contribution Dashboard. Answered critical business questions, such as monthly trends, channel performance, and top contributors. Presented actionable insights with professional visuals, making it easy for stakeholders to make data-driven decisions.

  18. Supply Chain DataSet

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
    Explore at:
    zip(9340 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Amir Motefaker
    Description

    Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.

  19. Iris Dataset - various format types

    • kaggle.com
    zip
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanda Prasetia (2024). Iris Dataset - various format types [Dataset]. https://www.kaggle.com/datasets/nandaprasetia/iris-dataset-various-format-types
    Explore at:
    zip(24187 bytes)Available download formats
    Dataset updated
    May 3, 2024
    Authors
    Nanda Prasetia
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Iris Dataset consists of 150 iris samples, each having four numerical features: sepal length, sepal width, petal length, and petal width. Each sample is categorized into one of three iris species: Setosa, Versicolor, or Virginica. This dataset is widely used as a sample dataset in machine learning and statistics due to its simple and easily understandable structure.

    Feature Information : - Sepal Length (cm) - Sepal Width (cm) - Petal Length (cm) - Petal Width (cm)

    Target Information : - Iris Species : 1. Setosa 1. Versicolor 1. Virginica

    Source : The Iris Dataset is obtained from the scikit-learn (sklearn) library under the BSD (Berkeley Software Distribution) license.

    File Formats :

    1. CSV (Comma-Separated Values): CSV format is the most commonly used and easily readable format. Each row represents one sample with its features separated by commas.
    2. Excel (.xlsx): Excel format is suitable for further data analysis, visualization, and integration with other software.
    3. JSON (JavaScript Object Notation): JSON format allows data to be stored in a more complex structure, suitable for web-based data processing or applications.
    4. Parquet: Parquet format is an efficient columnar data format for large and complex data.
    5. HDF5 (Hierarchical Data Format version 5): HDF5 format stores data in hierarchical groups and datasets, excellent for storing large scientific and numerical data.
    6. Feather: Feather format is a lightweight binary format for storing data frames. It provides excellent performance for reading and writing data.
    7. SQLite Database (.db, .sqlite): SQLite is a lightweight database format suitable for local data storage and querying. It is widely used for small to medium-scale applications.
    8. Msgpack: Msgpack format is a binary serialization format that is efficient in terms of storage and speed. It is suitable for storing and transmitting data efficiently between systems.

    The Iris Dataset is one of the most iconic datasets in the world of machine learning and data science. This dataset contains information about three species of iris flowers: Setosa, Versicolor, and Virginica. With features like sepal and petal length and width, the Iris dataset has been a stepping stone for many beginners in understanding the fundamental concepts of classification and data analysis. With its clarity and diversity of features, the Iris dataset is perfect for exploring various machine learning techniques and building accurate classification models. I present the Iris dataset from scikit-learn with the hope of providing an enjoyable and inspiring learning experience for the Kaggle community!

  20. Customer Shopping Trends Dataset

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    zip(149846 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1

Dataset of development of business during the COVID-19 crisis

Explore at:
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

Search
Clear search
Close search
Google apps
Main menu