5 datasets found
  1. a

    Data Exploration

    • schoolboard-esrica-k12admin.hub.arcgis.com
    Updated Nov 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education and Research (2021). Data Exploration [Dataset]. https://schoolboard-esrica-k12admin.hub.arcgis.com/datasets/edu::data-exploration
    Explore at:
    Dataset updated
    Nov 5, 2021
    Dataset authored and provided by
    Education and Research
    Description

    ArcGIS Online is a powerful tool to engage students in their learning. It's also a great way to access, visualize and analyse data in the form of maps, charts and graphs.Use ArcGIS Online to find relevant data for assignments and projects. If you need an ArcGIS Online account request one here.Things to consider:

  2. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  3. q

    Teaching Data Viz and Communication as an Undergraduate Biology Course:...

    • qubeshub.org
    Updated Jun 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristine Grayson; Angie Hilliker (2021). Teaching Data Viz and Communication as an Undergraduate Biology Course: Assignments and Projects [Dataset]. http://doi.org/10.25334/5C87-YE71
    Explore at:
    Dataset updated
    Jun 23, 2021
    Dataset provided by
    QUBES
    Authors
    Kristine Grayson; Angie Hilliker
    Description

    Teaching materials co-developed for a new upper-level undergraduate biology course to teach data exploration and communication without requiring previous coding experience.

  4. S

    Research on the Innovation of Teaching Evaluation and Feedback Mechanism for...

    • scidb.cn
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zuo (2025). Research on the Innovation of Teaching Evaluation and Feedback Mechanism for Food Specialization Based on Generative Artificial Intelligence [Dataset]. http://doi.org/10.57760/sciencedb.27965
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Zuo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset focuses on the research on the innovation of assessment and feedback mechanisms for food professional teaching using generative artificial intelligence (GenAI). It consists of two sub-datasets, collecting relevant information from both the teacher and student perspectives, providing rich data support for in-depth exploration of the application effect and influence of GenAI in food professional teaching.Teacher perspective dataset ('Generation-Driven Innovation of Assessment and Feedback Mechanisms in Food Professional Teaching.xlsx')1. Data size: Contains 31 records, covering 32 different dimensions of relevant information.2. Data content• Basic information: Records the submission time of the answer sheet, the time spent, and the source, which can be used to understand the process and channels of data collection.• GenAI application situation: Involves the proportion of class hours where teachers use GenAI for assessment, as well as the scenarios where teachers consider GenAI assessment to be the most effective, such as theoretical assignments, laboratory reports, product design, and classroom interactions.• Teaching effect feedback: Includes the changes in the average scores of students in teaching links such as theoretical exams, laboratory reports, and product design after the introduction of GenAI assessment, as well as the reduction in grading time and the timeliness of feedback after using GenAI for automatic feedback.• Problems and solutions: Records the biggest conflicts encountered, whether GenAI assignment abuse was found, and the most effective identification methods (such as questioning details during the defense, on-site review of experimental operations, AI detection tools, etc.). It also includes content that needs to be publicly disclosed to improve assessment reliability, such as the source of AI model training data, the proportion of manual review of AI results, etc.• Teaching improvement direction: Involves teachers' views on GenAI in improving students' grades/capability, saving time costs, and cultivating practical innovation abilities, as well as evaluations of GenAI in real-time capturing experimental operation scores, automatically associating the latest food industry national standards, multimodal feedback, and academic compliance detection functions.Student perspective dataset ('Generation-Driven Innovation of Assessment and Feedback Mechanisms in Food Professional Teaching Student Version.xlsx')1. Data size: Contains 136 records, involving 29 related variables.2. Data content• Basic information: Similarly records the submission time of the answer sheet, the time spent, and the source, providing background information for data analysis.• Learning experience and comparison: Students' sources of understanding previous levels, and whether they believe they have performed better than previous students who did not use GenAI in terms of theoretical knowledge mastery, food laboratory operation standardization, and application of industry standards.• GenAI feedback impact: Includes the impact of personalized GenAI feedback on adjusting the frequency of learning focus, enhancing learning interest, and modifying homework/reports, as well as the final score changes. It also involves changes in task completion time (such as laboratory report writing, industry plan design) due to GenAI feedback.• Problem feedback: Whether students encountered difficulties in understanding the GenAI feedback and can describe specific cases. At the same time, students' views on the content that must be disclosed to trust GenAI assessment (such as comparison of previous and current score standards, statistics of AI scoring errors, appeal and review process, etc.).• Expectation function evaluation: Students' expectations for GenAI in improving grades/capability, saving time costs, and cultivating practical innovation abilities, as well as evaluations of GenAI in real-time capturing experimental operation scores, automatically associating the latest food industry national standards, multimodal feedback, and academic compliance detection functions.

  5. Netflix Movies & TV Shows dataset

    • kaggle.com
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zubaira Maimona (2025). Netflix Movies & TV Shows dataset [Dataset]. https://www.kaggle.com/datasets/zubairamuti/netflix-movies-and-tv-shows-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Kaggle
    Authors
    Zubaira Maimona
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Regarding this dataset, Netflix is among the most popular websites for streaming movies and videos. They have more than 200 million members globally as of the middle of 2021, and their platform offers over 8,000 movies and TV shows. This tabular dataset contains listings of all the movies and TV shows available on Netflix, together with details about the actors, directors, ratings, length, year of release, and other details.

    Interesting Ideas to do Tasks for the people from different backgrounds

    For Analysts of Data

    Content Trends Over Time - Examine the annual changes in Netflix's movie and TV show counts. 2. Genre Popularity - Discover the most popular genres and how their popularity changes by location or year. 3. Country Insights - Find out which nations produce the most shows and what kinds of content they contribute. 4. Ratings Distribution - Show how the mature ratings (G, PG, R, TV-MA) are distributed throughout Netflix material. 5. Best Directors & Actors - Find the actors or directors who show up on Netflix the most.

    For Data Scientists

    Create a content-based recommender by utilizing genres and title descriptions in the Recommendation System Prototype. 2. Text Analysis on Descriptions - Apply natural language processing (NLP) to identify trends in the way Netflix characterizes its material using terms like "crime," "adventure," and "love." 3. Classification Models - Use metadata to determine if a title is a movie or a TV show. Using genres, lengths, and descriptions, group films and television series into clusters. 5. Trend Forecasting - Forecast future growth in the Netflix library using time-series analysis.

    For Students (Study Assignments)

    1. Data Cleaning & Preprocessing - Standardize formats and deal with missing variables (such as directors/countries).
    2. Exploratory Data Analysis (EDA): Make notebooks or dashboards with a ton of graphics that illustrate Netflix trends.
    3. Data Visualization Practice - Create imaginative graphics such as word clouds or heatmaps using Matplotlib, Seaborn, or Plotly. Storytelling with Data: Compose a data tale on how Netflix changed from renting out DVDs to becoming a major worldwide streaming service.
    4. Beginner Machine Learning – Start small: use genre or description to forecast maturity rating.

    Approach to the Netflix Dataset

    1. Understand the Data (Initial Exploration)

      • Load the dataset and check its size, columns, and data types.
      • Get a sense of the key fields: title, type, country, release_year, rating, etc.
      • Look for unique values (e.g., how many genres, countries, ratings).
    2. Data Cleaning & Preprocessing

      • Handle missing values (some entries don’t have directors or countries).
      • Standardize inconsistent formats (e.g., dates in date_added).
      • Split multi-valued columns (like genres or cast) if needed.
      • Convert durations into numeric values (minutes or seasons).
    3. Exploratory Data Analysis (EDA)

      • Compare Movies vs. TV Shows count.
      • Analyze content growth trend by release year or date added.
      • Study genre popularity across different countries.
      • Explore rating distribution (family-friendly vs. mature content).
      • Identify most frequent directors, actors, and countries.
    4. Visualization & Storytelling

      • Create bar charts, pie charts, heatmaps, and timelines.
      • Use word clouds for descriptions and genres.
      • Highlight interesting trends (e.g., rise of international TV shows).
    5. Advanced Analysis / Data Science Tasks

      • Build a recommendation system (based on genres & descriptions).
      • Perform sentiment/keyword analysis on descriptions.
      • Apply clustering to group similar shows/movies.
      • Predict whether a title is a movie or TV show from metadata.
    6. Insights & Reporting

      • Summarize key findings (e.g., “TV shows are growing faster than movies,” “US and India dominate Netflix content”).
      • Create dashboards (Tableau, Power BI, or Python libraries like Plotly).
      • Share a story rather than just numbers—make it human and relatable.
  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Education and Research (2021). Data Exploration [Dataset]. https://schoolboard-esrica-k12admin.hub.arcgis.com/datasets/edu::data-exploration

Data Exploration

Explore at:
Dataset updated
Nov 5, 2021
Dataset authored and provided by
Education and Research
Description

ArcGIS Online is a powerful tool to engage students in their learning. It's also a great way to access, visualize and analyse data in the form of maps, charts and graphs.Use ArcGIS Online to find relevant data for assignments and projects. If you need an ArcGIS Online account request one here.Things to consider:

Search
Clear search
Close search
Google apps
Main menu