Facebook
TwitterThis dataset was created by Mark Dobres
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Oscar NG
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Yuxian Chen
Facebook
TwitterThis dataset was created by Will Newt
Facebook
TwitterTo make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.
You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:
df<-read.csv('uber.csv')
df_black<-subset(uber_df, uber_df$name == 'Black')
write.csv(df_black, "nameofthefileyouwanttosaveas.csv")
getwd()
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset was created by chimaralavamshireddy
Released under U.S. Government Works
Facebook
TwitterThis dataset was created by Prutchakorn
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed metadata for over 240,000 video games sourced from the IGDB API. It includes information about each game's release, genres, themes, platforms, developers, publishers, player perspectives, game modes, ratings, summaries, media assets (screenshots, artworks, covers), and more. This dataset is ideal for projects in game recommendation, clustering, tagging, genre analysis, and player preference modeling.
Facebook
Twitterhttp://insideairbnb.com/get-the-data.html
A. Is there seasonality in the prices of properties listed in Airbnb-Berlin? B. Which are the popular areas of Berlin among the tourists? C. An analysis of reviews – using text mining D. Which are the most commonly available amenities in the properties of Berlin? E. Can we predict the price of properties in Berlin by analyzing other column values?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Danizo
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Sandy3108
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterThis dataset was created by Ahmed Izaz Bhuiyan
Facebook
TwitterThese are artificially made beginner data mining datasets for learning purposes.
Case study:
The aim of FeelsLikeHome_Campaign dataset is to create project is in which you build a predictive model (using a sample of 2500 clients’ data) forecasting the highest profit from the next marketing campaign, which will indicate the customers who will be the most likely to accept the offer.
The aim of FeelsLikeHome_Cluster dataset is to create project in which you split company’s customer base on homogenous clusters (using 5000 clients’ data) and propose draft marketing strategies for these groups based on customer behavior and information about their profile.
FeelsLikeHome_Score dataset can be used to calculate total profit from marketing campaign and for producing a list of sorted customers by the probability of the dependent variable in predictive model problem.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.
Urban Transportation, Consumer, e-Commerce Retail
Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a generated representation of an open-pit mining block model, designed to reflect realistic geological, spatial, and economic conditions found in large-scale mineral extraction projects. It contains 75,000 individual blocks, each representing a unit of earth material with associated attributes that influence decision-making in mine planning and resource evaluation.
The dataset includes essential parameters such as ore grade, tonnage, economic value, and operational costs. A calculated profit value and a corresponding binary target label indicate whether a block is considered economically viable for extraction. This setup supports various types of analysis, such as profitability assessments, production scheduling, and resource categorization.
🔑 Key Features Block_ID: Unique identifier for each block in the model.
Spatial Coordinates (X, Y, Z): 3D location data representing the layout of the deposit.
Rock Type: Geological classification of each block (e.g., Hematite, Magnetite, Waste).
Ore Grade (%): Iron content percentage for ore-bearing blocks; set to 0% for waste.
Tonnage (tonnes): Total mass of the block, used in volume and value calculations.
Ore Value (¥/tonne): Estimated revenue based on grade and market assumptions.
Mining Cost (¥): Estimated extraction cost per block.
Processing Cost (¥): Cost associated with refining ore-bearing blocks.
Waste Flag: Indicates whether a block is classified as waste material (1 = Waste, 0 = Ore).
Profit (¥): Net value after subtracting mining and processing costs from potential revenue.
Target: Label indicating whether a block is economically profitable (1 = Yes, 0 = No).
This dataset is ideal for applications related to mineral resource evaluation, production planning, and profitability analysis. It can also be used for teaching and demonstration purposes in mining engineering and resource management contexts.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset created for using in IZU YAM433 - Data Mining Course Project. You must read the describtions for columns to understand the data
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Andrew Dolcimascolo-Garrett
Released under MIT
Facebook
TwitterThis dataset was created by Dr THABIT FURSAN
Facebook
TwitterSequential pattern mining is the discovery of subsequences that are frequent in a set of
sequences. The process is similar to the frequent itemset mining1 except that the input database
is ordered. As the output of a sequential pattern mining algorithm, it generates a set of frequent
sequential patterns, which are sub-sequences that have a frequency in the database greater than
or equal to the user-specified minimum support.
Let the data set shown in Table 1 where events are accompanied by instants of occurrence in
each tuple.
https://pasteboard.co/JRNB4rH.png" alt="Image of table">
We can note that, for a fixed threshold equal to 1, the pattern < A, B, C > is considered as frequent because its support (the number of occurrences in the database) is equal to 2.
Let us assume the example given in Table 1. < A, B, C > is considered a frequent sequential pattern. It shows that events A, B, and C occurred frequently in a sequence manner, but
without providing any additional information about the gap between them. For instance, we
do not know when B would happen, knowing that A already did. Therefore, we ask you to
provide a richer pattern where time constraints are considered. In our data set example, we
can deduce that A, B, and C occur sequentially, and that B occurs after A at least after one instant and at most after 5 instants, while C occurs after B in the interval [2, 4] of instants. We represent our pattern as A[1,5]B and B[2,4]C. It is a direct graph where nodes are events and vertices are the instant intervals, denoted by time constraints as shown in Figure 1.
https://pasteboard.co/JRNBWWL.png" alt="Image">
Formally, Definition (Event) An event is a couple (e,t) where e ϵ Ε is the type of the event and t ϵ Τ is its time. Definition (Sequence) Let E be a set of event types and T a time domain such that T ⊆ R. E is assumed totally ordered and is denoted #
Facebook
TwitterThis dataset was created by Mark Dobres