Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jeffrey Mvutu Mabilama [source]
This dataset brings you closer than ever to the reality of top-selling products and their performance in e-commerce platforms. It gives you detailed lists of each product's features, ratings, sales, reviews and other metrics so that you can understand what makes a successful summer product on Wish. With this data at hand, you have access to not only a curated list of top summer products but also to the power of analytics for boosting your business operations.
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information about summer product listings, ratings, and sales performance data on the Wish e-commerce platform. Using this information you will be able to understand how well certain products sell, the average price of products in the summer season and many more interesting insights that can be gained from this dataset.
- Estimating the optimal pricing strategy for a product based on its ratings, merchant ratings count, mean discount and other metrics. This would help businesses to determine which pricing strategy would produce the most profits while still keeping customers interested in their products.
- Analyzing the performance of seasonal summer products by studying correlations between them, and their ratings, units sold and prices etc., allowing businesses to identify trends more accurately and improve sales strategies accordingly.
- Tracking sellers’ fame across different countries through analysis of customer reviews for each product listed by them in order to understand better how location affects sales performance as well as evaluate customer satisfaction with particular sellers regarding shipping times or quality of products supplied from aforesaid seller’s inventory
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: summer-products-with-rating-and-performance_2020-08.csv | Column name | Description | |:---------------------------------|:-----------------------------------------------------------------------| | title | The title of the product. (String) | | title_orig | The original title of the product. (String) | | price | The price of the product. (Float) | | retail_price | The original retail price of the product. (Float) | | currency_buyer | The currency of the buyer. (String) | | units_sold | The number of units sold. (Integer) | | uses_ad_boosts | A flag indicating if the product has been boosted using ads. (Boolean) | | rating | The rating of the product. (Float) | | rating_count | The total number of ratings for the product. (Integer) | | rating_five_count | The number of five star ratings for the product. (Integer) | | rating_four_count | The number of four star ratings for the product. (Integer) | | rating_three_count | The number of three star ratings for the product. (Integer) | | rating_two_count | The number of two star ratings for the product. (Integer) | | rating_one_count | The number of one star ratings for the product. (Integer) | | badges_count | The number of badges associated with the product. (Integer) | | badge_local_product | A flag indicating if the product is a local product. (Boolean) | | badge_product_quality | A flag indicating if the product has a quality badge. (Boolean) | | badge_fast_shipping | A flag in...
Facebook
TwitterWARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of 2024. Expected changes:Metadata is missing or incomplete for some layers at this time and will be continuously improved.We expect to update this layer roughly in line with CDTFA at some point, but will increase the update cadence over time as we are able to automate the final pieces of the process.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty and incorporated place (city) boundaries along with third party identifiers used to join in external data. Boundaries are from the authoritative source the California Department of Tax and Fee Administration (CDTFA), altered to show the counties as one polygon. This layer displays the city polygons on top of the County polygons so the area isn"t interrupted. The GEOID attribute information is added from the US Census. GEOID is based on merged State and County FIPS codes for the Counties. Abbreviations for Counties and Cities were added from Caltrans Division of Local Assistance (DLA) data. Place Type was populated with information extracted from the Census. Names and IDs from the US Board on Geographic Names (BGN), the authoritative source of place names as published in the Geographic Name Information System (GNIS), are attached as well. Finally, the coastline is used to separate coastal buffers from the land-based portions of jurisdictions. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal Buffers (this dataset)Without Coastal BuffersPlace AbbreviationsUnincorporated Areas (Coming Soon)Census Designated Places (Coming Soon)Cartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except COASTAL, Area_SqMi, Shape_Area, and Shape_Length to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCOPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering systemPlace Name: CDTFA incorporated (city) or county nameCounty: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.Legal Place Name: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.GEOID: numeric geographic identifiers from the US Census Bureau Place Type: Board on Geographic Names authorized nomenclature for boundary type published in the Geographic Name Information SystemPlace Abbr: CalTrans Division of Local Assistance abbreviations of incorporated area namesCNTY Abbr: CalTrans Division of Local Assistance abbreviations of county namesArea_SqMi: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.COASTAL: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 12/17/2024 by Nick Santos using code path at https://github.com/CDT-ODS-DevSecOps/cdt-ods-gis-city-county/ at commit 0bf269d24464c14c9cf4f7dea876aa562984db63. It incorporates updates from CDTFA as of 12/12/2024. Future updates will include improvements to metadata and update frequency.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.
Facebook
TwitterThe XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
Facebook
Twitter****Business Problem Overview**** Let us say that Reliance Jio Infocomm Limited approached us with a problem. There is a general tendency in the telecom industry that customers actively switch from one operator to another. As the telecom is highly competitive, the telecommunications industry experiences an average of 18-27% annual churn rate. Since, it costs 7-12 times more to acquire a new customer as compared to retaining an existing one, customer retention is an important aspect when compared with customer acquisition which is why our clients, Jio, wants to retain their high profitable customers and thus, wish to predict those customers which have a high risk of churning. Also, since a postpaid customer usually informs the operator prior to shifting their business to a competitor’s platform, our client is more concerned regarding its prepaid customers that usually churn or shift their business to a different operator without informing them which results in loss of business because Jio couldn’t offer any promotional scheme in time, to prevent churning. As per Jio, there are two kinds of churning - revenue based and usage based. Those customers who have not utilized any revenue-generating facilities such as mobile data usage, outgoing calls, caller tunes, SMS etc. over a given period of time. To determine such a customer, Jio usually uses an aggregate metrics like ‘customers who have generated less than ₹ 7 per month in total revenue’. However, the disadvantage of using such a metric would be that many of Jio customers who use their services only for incoming calls will also be counted/treated as churn since they do not generate direct revenue. In such scenarios, revenue is generated by their relatives who also uses Jio network to call them. For example, many users in rural areas only receive calls from their wage-earning siblings in urban areas. The other type of Churn, as per our client, is usage based which consists of customers who do not use any of their services i.e., no calls (either incoming or outgoing), no internet usage, no SMS, etc. The problem with this segment is that by the time one realizes that a customer is not utilizing any of the services, it may be too late to take any corrective measure since the said customer might already switched to another operator. Currently, our client, Reliance Jio Infocomm Limited, have approached us to help them in predicting customers who will churn based on the usage-based definition Another aspect that we have to bear in mind is that as per Jio, 80% of their revenue is generated from 20% of their top customers. They call this group High-valued customers. Thus, if we can help reduce churn of the high-value customers, we will be able to reduce significant revenue leakage and for this they want us to define high-value customers based on a certain metric based on usage-based churn and predict only on high-value customers for prepaid segment. Understanding the Data-set The data-set contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively. The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months. To do this task well, understanding the typical customer behavior during churn will be helpful. Understanding Customer Behavior During Churn Customers usually do not decide to switch to another competitor instantly, but rather over a period of time (this is especially applicable to high-value customers). In churn prediction, we assume that there are three phases of customer lifecycle: 1) The ‘good’ phase: In this phase, the customer is happy with the service and behaves as usual. 2) The ‘action’ phase: The customer experience starts to sore in this phase, for e.g. he/she gets a compelling offer from a competitor, faces unjust charges, becomes unhappy with service quality etc. In this phase, the customer usually shows different behavior than the ‘good’ months. Also, it is crucial to identify high-churn-risk customers in this phase, since some corrective actions can be taken at this point (such as matching the competitor’s offer/improving the service quality etc.) 3) The ‘churn’ phase: In this phase, the customer is said to have churned. You define churn based on this phase. Also, it is important to note that at the time of prediction (i.e. the action months), this data is not available to you for prediction. Thus, after tagging churn as 1/0 based on this phase, you discard all data corresponding to this phase. In this case, since you are working over a four-month window, the first two months are the ‘good’ phase, the third month is the ‘action’ phase, while the fourth month is the ‘churn’ phase. Data Dictionary The data-set is available in a csv file named as “Company Data.csv” and the da...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jeffrey Mvutu Mabilama [source]
The Summer Products and Sales Performance dataset is a comprehensive collection of product listings, ratings, and sales data from the Wish platform. The dataset aims to provide insights into the trends and patterns in e-commerce during the summer season. It contains valuable information such as product titles, prices, retail prices, currency used for pricing, units sold, whether ad boosts are used for product listings, average ratings for products, total ratings count for products, counts of five-star to one-star ratings for products.
Additionally, the dataset includes data on various aspects related to product quality and shipping options such as badges count (indicating special qualities), local product status (whether the product is sold locally), product quality rating badges (indicating the quality of the product), fast shipping availability badges (indicating whether fast shipping is available), tags associated with products (making them more discoverable), color variations of products available in inventory along with their count. It also provides information on different shipping options including option names and their corresponding prices.
Moreover,the dataset encompasses details about merchants selling these products including merchant title and name as well as information on merchant rating count (total number of ratings received by merchants) ,merchant profile picture availability,and subtitle which gives additional details about merchant's info.
The dataset further includes links to images of individual listed products along with links to respective online shop pages where these are found . In addition,currency buyer specifies currency type used by buyers throughout various transactions.Items flagged under urgency text have an associated urgency text rate indicating how urgently they are desired or needed.
This comprehensive dataset also allows users to analyze units sold per listed item as well as mean units sold per listed item across different categories/theme .Further evaluation can be done using totalunitsold variable which represents total volume sales from all listed items tied together across Wish platform.
Aiding further analysis around elasticity theory users can find marked down rates/percentage tagged describing discounts over retail price,ranging from 0-1 as well as average discount values for individual listed products.Further custom insights such as number of countries items can be delivered to, their origin country, if they possess an urgency banner or fast shipping and if the seller is famous/has a profile picture.
This comprehensive dataset served to build model helping sellers predict how well an item may sell so as to equip businesses with ability to make replenishment decisions guided by this model
Familiarize Yourself with the Columns:
- Before diving into data analysis, it's important to understand the meaning of each column in the dataset. The columns contain information such as product titles, prices, ratings, inventory details, shipping options, merchant information, and more. Refer to the dataset documentation or use descriptive statistics methods to gain insights into different attributes.
Explore Product Categories:
- The dataset includes a column named theme that represents the category or theme of each product listing. By analyzing this column's values and frequency distribution, you can identify top-selling categories during the summer season. This information can be beneficial for businesses looking to optimize their product offerings.
Analyze Pricing Data:
- The columns like price, retail_price, and currency_buyer provide insights into pricing strategies employed by sellers on Wish platform.
- Calculate various statistical measures like mean price using 'meanproductprices', highest priced items using 'price', average discount using averagediscount'
- Investigate relationships between pricing factors such as discounted prices compared to original retail prices ('discounted price' = 'retail_price' - 'price').
Examine Ratings Data: 4a) Analyze Product Ratings: To gauge customer satisfaction levels regarding products listed on Wish platform products rating features have been provided. Available columns- -> Number of ratings received per star rating -> Total number of ratings received (
rating_count) -> Average rating (rating) Perform analysis to find: - Aver...
Facebook
TwitterSUMMARY:
Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.
If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.
VERTICALS:
PRICING/QUOTE:
Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:
Send us over this info and we can answer any questions you have, provide sample, and more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
by Vizonix
This dataset differentiates between 4 similar object classes: 4 types of canned goods. We built this dataset with cans of olives, beans, stewed tomatoes, and refried beans.
The dataset is pre-augmented. That is to say, all required augmentations are applied to the actual native dataset prior to inference. We have found that augmenting this way provides our users maximum visibility and flexibility in tuning their dataset (and classifier) to achieve their specific use-case goals. Augmentations are present and visible in the native dataset prior to the classifier - so it's never a mystery what augmentation tweaks produce a more positive or negative outcome during training. It also eliminates the risk of downsizing affecting annotations.
The training images in this dataset were created in our studio in Florida from actual physical objects to the following specifications:
The training images in this dataset were composited / augmented in this way:
1,600 (+) different images were uploaded for each class (out of the 25,000 total images created for each class).
Understanding our Dataset Insights File
As users train their classifiers, they often wish to enhance accuracy by experimenting with or tweaking their dataset. With our Dataset Insights documents, they can easily determine which images possess which augmentations. Dataset Insights allow users to easily add or remove images with specific augmentations as they wish. This also provides a detailed profile and inventory of each file in the dataset.
The Dataset Insights document enables the user to see exactly which source image, angle, augmentation(s), etc. were used to create each image in the dataset.
Dataset Insight Files:
About Vizonix
Vizonix (vizonix.com) creates from-scratch datasets created from 100% in-house generated photography. Our images and backgrounds are generated in-house in our Florida studio. We typically image smaller items, deliver in 72 hours, and specialize in Manufacturer Quality Assurance (MQA) datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1st Dec 2024. This version of the dataset has been superseeded and is now restricted. Please refer to the most recent release.
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year Worth of Social Data". PlosOne (2024) a https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky dataset have the right to opt out and request the removal of their data, in accordance with GDPR provisions (Article 17). It should be noted, however, that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides derogations (Article 17(3)(d) and Article 89).
We emphasize that, in compliance with GDPR (Article 4(5)), the released data has been thoroughly pseudonymized. Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to further protect individual privacy.
If you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with subject "Removal request: [username]").
We will process your request within a reasonable timeframe.
This work is supported by :
Facebook
TwitterAssume you are a Data Analyst in an EdTech company. Your company is focused on accelerating its growth by increasing the number of enrolled users.
Therefore, you have been asked to analyze various aspects of customer acquisition to see the status of new users’ growth in your company. The insights you discover will help your business team in designing a better marketing strategy for your company.
Your recommendations must be backed by data insights and professional visualizations to help your business team design road maps, strategies, and action items to achieve their goals.
You are given a month’s data for your analysis. This dataset contains the details of the leads in various stages of the customer acquisition flow.
Lead - Awareness - Consideration - Conversion
lead_basic_details: Contains details of the leads.
sales_managers_assigned_leads_details: Contains the details of the senior and junior sales managers and their assigned leads.
leads_interaction_details: Contains the details of call interactions of junior sales managers with the leads.
leads_demo_watched_details: Contains the details of the demo session watched by the leads.
leads_reasons_for_no_interest: Contains the details of the reasons given by the leads for their lack of interest.
lead_basic_detailslead_id: unique id of the lead [string]age: age of the lead [int]gender: gender of the lead [string]current_city: city of residence of the lead [string]current_education: current education details of the lead [string]parent_occupation: occupation of the parent of the lead [string]lead_gen_source: source from which the lead is generated [string]sales_managers_assigned_leads_detailssnr_sm_id: unique id of the senior sales manager [string]jnr_sm_id: unique id of the junior sales manager [string]assigned_date: date at which certain leads are assigned to junior sales manager [date]cycle: cycle in which the lead is assigned [string]lead_id: unique id of the lead [string]leads_interaction_details:jnr_sm_id: unique id of the junior sales manager [string]lead_id: unique id of the lead [string]lead_stage: stage of the lead when contacted by junior sales manager [string]
call_done_date: date of call done to lead by junior sales manager [date]call_status: status of the call made to the lead [string]
call_reason: reason for calling the lead [string]
lead_introduction
- demo_scheduled
- demo_not_attendedafter_demo_followupfollowup_for_considerationinterested_for_conversionfollowup_for_conversion
Facebook
TwitterThe data asset is relational. There are four different data files. One represents customer information. A second contains address information. A third contains demographic data, and a fourth includes customer cancellation information. All of the data sets have linking ids, either ADDRESS_ID or CUSTOMER_ID. The ADDRESS_ID is specific to a postal service address. The CUSTOMER_ID is unique to a particular individual. Note that there can be multiple customers assigned to the same address. Also, note that not all customers have a match in the demographic table. The latitude-longitude information generally refers to the Dallas-Fort Worth Metroplex in North Texas and is mappable at a high level. Just be aware that if you drill down too far, some people may live in the middle of Jerry World, DFW Airport, or Lake Grapevine. Any lat/long pointing to a specific residence, business, or physical site is coincidental. The physical addresses are fake and are unrelated to the lat/long.
In the termination table, you can derive a binary (churn/did not churn) from the ACCT_SUSPD_DATE field. The data set is modelable. That is, you can use the other data in the data to predict who did and did not churn. The underlying logic behind the prediction should be consistent with predicting auto insurance churn in the real world.
Terms and Conditions Unless otherwise stated, the data on this site is free. It can be duplicated and used as you wish, but we'd appreciate and if you source it as coming from us.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
TfL statement: We've committed to making our open data freely available to third parties and to engaging developers to deliver new products, apps and services for our customers. Over 11,000 developers have registered for our open data, consisting of our unified API (Application Programming Interface) that powers over 600 travel apps in the UK with over 46% of Londoners using apps powered by our data. This enables millions of journeys in London each day, giving customers the right information at the right time through their channel of choice. Why are we committing to open data? Public data - As a public body, our data is publically owned Reach - Our goal is to ensure any person needing travel information about London can get it wherever and whenever they wish, in any way they wish Economic benefit - Open data facilitates the development of technology enterprises, small and medium businesses, generating employment and wealth for London and beyond Innovation - By having thousands of developers working on designing and building applications, services and tools with our data and APIs, we are effectively crowdsourcing innovation How is our open data presented? Data is presented in three main ways: Static data files - Data files which rarely change Feeds - Data files refreshed at regular intervals API (Application Programming Interface) - Enabling a query from an application to receive a bespoke response, depending on the parameters supplied. Find out more about our unified API. Data is presented as XML wherever possible.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Automotive Industry: The model can be used in quality control scenarios in the automotive manufacturing industry, to automatically detect different wheel types and monitor for any mismatches or defects.
Traffic Monitoring Systems: The application can be extended to intelligent traffic surveillance systems. It can be used to identify different types of vehicles based on their wheels and help in tracking or route analysis of specific vehicle categories.
Digital Asset Management: In content creation domains such as video game development or movie production, the 'Wheels' model could be used to categorize and sort digital assets, making it easier for designers to find specific wheel types or styles for their projects.
E-Commerce Platforms: The model can be utilized to tag and identify images of cars, bicycles, or other vehicles and their components on online retail platforms, which can help customers find specific wheel types they wish to purchase.
Disability Aid Software: It could also be used in the development of assistive technologies for people with visual impairments. The model could detect and communicate the types of vehicles nearby based on the style of their wheels.
Facebook
TwitterAssume you are a data analyst in an EdTech company. The company’s customer success team works with an objective to help customers get the maximum value from their product by doing deeper dives into the customer's needs, wants and expectations from the product and helping them reach their goals.
The customer success team is aiming to achieve sustainable growth by focusing on retaining the existing users.
Therefore, your team wants to analyze the activity of your existing users and understand their performance, behaviours, and patterns to gain meaningful insights, that help your customer success team take data-informed decisions.
Your recommendations must be backed by meaningful insights and professional visualizations which will help your customer success team design road maps, strategies, and action items to achieve the goal.
The dataset contains the basic details of the enrolled users, their learning resource completion percentages, activities on the platform and the structure of learning resources available on the platform
1.**users_basic_details**: Contains basic details of the enrolled users.
2.**day_wise_user_activity**: Contains the details of the day-wise learning activity of the users.
- A user shall have one entry for a lesson in a day.
3.**learning_resource_details**: Contains the details of learning resources offered to the enrolled users
- Content is stored in a hierarchical structure: Track → Course →Topic → Lesson. A lesson can be a video, practice, exam, etc.
- Example: Tech Foundations → Developer Foundations → Topic 1 → lesson 1
4.**feedback_details**: Contains the feedback details/rating given by the user to a particular lesson.
- Feedback rating is given on a scale of 1 to 5, 5 being the highest.
- A user can give feedback to the same lesson multiple times.
5.**discussion_details**: Contains the details of the discussions created by the user for a particular lesson.
6.**discussion_comment_details**: Contains the details of the comments posted for the discussions created by the user.
- Comments may be posted by mentors or users themselves.
- The role of mentors is to guide and help the users by resolving the doubts and issues faced by them related to their learning activity.
- A discussion can have multiple comments.
users_basic_details:
user_id: unique id of the user [string]gender: gender of the enrolled user [string]current_city: city of residence of the user [string]batch_start_datetime: start datetime of the batch, for which the user is enrolled [datetime]referral_source: referral channel of the user [string]highest_qualification: highest qualification (education details) of the enrolled user [string]day_wise_user_activity:
activity_datetime: date and time of learning of the user [datetime]user_id: unique id of the user [string]lesson_id: unique id of the lesson [string]lesson_type: type of the lesson. It can be "SESSION", "PRACTICE", "EXAM" or "PROJECT" [string]day_completion_percentage: percent of the lesson completed by the user on a particular day (out of 100%) [float]
overall_completion_percentage: overall completion percentage of the lesson till date by the user (out of 100%) [float]
day_completion_percentage - 10%, overall_completion_percentage - 10%day_completion_percentage - 35%, overall_completion_percentage - 45%day_completion_percentage - 37%, overall_completion_percentage - 82%day_completion_percentage - 18%, overall_completion_percentage - 100%learning_resource_details:
track_id: unique id of the track [string]track_title: name of the track [string]course_id: unique id of the course [string]
Facebook
TwitterBy Marcos Dias [source]
This dataset from Boston Air BnB provides you with an in-depth look into the experiences of past customers and insightful reviews about their stay. It includes detailed information such as the date of the review, reviewer name, their specific comments, and more! Allowing you to better understand each customer's personal opinion on a property, this dataset is perfect for those who wish to gain a comprehensive understanding of what makes AirBNB experiences so special. Not only do real estate investors have access to important data points like price and location, but they can now gain insider information on what made past customers truly happy with their stay - that could mean all the difference in building a successful business! Dive into this remarkable collection of reviews now and get ready to experience it yourself soon
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset of Boston AirBNB Reviews provides detailed insights into the guest experience with a wide range of AirBNBs in Boston. It provides key information that can be used to identify areas of strength and improvement when developing hospitality strategies in this area.
- Using the reviewer name and comments to create a sentiment analysis tool that rate hosts and listings based on customers' experience, helping potential guests make informed choices when booking.
- Combining data from the reviews with other data such as prices, availability, and amenities for Airbnb users to enable customers to compare cost efficiency with specific needs in one platform.
- Analyzing the comments made by reviewers over time in order to track trends of customer concerns or feedback allowing Airbnb host to adjust their services accordingly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: reviews.csv | Column name | Description | |:------------------|:---------------------------------------------------------------| | date | Date of the review submission. (Date) | | reviewer_name | Name of the reviewer. (String) | | comments | Comments made by the reviewer about their experience. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Marcos Dias.
Facebook
TwitterThis dataset analyzes differences between subscribers and non-subscribers for a bike-sharing company: Cyclistic. The goal is to gain insights in order to convert non-subscribers to subscribers.
The data itself was provided by Cyclistic and you can find the files in the Data Explorer section.
The business task is to analyze how subscribers and non-subscribers use the bike-sharing service and to give three recommendations on how to convert non-subscribers to subscribers.
The data sources used are .csv files that contain bike-sharing usage for the first quarter of 2019 and 2020.
The two .csv files presented data differently (for example, the file for 2019 refers to Subscribers as "Subscriber" whereas the file for 2020 refers to them as "member"). I manipulated the data so that the two files used the same column names, classes, and cell content names.
I also manipulated the data to see the length of each ride. From there, I was able to determine the amount of rides and average ride length for both subscribers and non-subscribers.
From the analysis, it became clear that subscribers take a lot more rides, while non-subscribers take much longer rides. One reason this might be the case is since subscribers can take as many rides as they wish, they use the service more often. Non-subscribers, however, have to pay for each use, so they only use the service when they really need it for longer trips.
Embedded are two visuals. The first displays the amount of rides taken by both subscribers and non-subscribers and the second displays average ride lengths.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7588669%2F238a31fe16040ac08fb3a59f1d617b57%2FNUMBER%20OF%20RIDES.png?generation=1724273515139071&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7588669%2Fec6344c8c51219cfa1512d8747e19a96%2FAVERAGE%20DURATION.png?generation=1724273529213838&alt=media" alt="">
Facebook
TwitterBy OECD [source]
This dataset provides key indicators regarding the general government debt of OECD countries. These figures reflect the fact that an unfortunate portion of a country’s annual budget is allocated to repaying debt, and suggest different levels of financial stability across nations. By examining this data, we can observe fluctuations in public debt levels over time, as well as how various countries compare in terms of their general government debt buckets. Featuring monthly measurements for multiple years, these data points provide valuable insight into how borrowing affects the overall financial landscapes of the countries captured. Additionally, a convenient flag code system separately gauges the data’s accuracy and credibility to ensure that only reliable readings are observed
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains indicators on general government debt for OECD members and selected non-members. The indicators included in this dataset are useful in measuring country-level financial stability and understanding the sources of government financing. This dataset is a great resource for researchers, policy makers, and data journalists who are interested in analyzing trends in public finances across countries.
The data is organized by twelve columns, which include LOCATION, INDICATOR, SUBJECT, MEASURE, FREQUENCY (annual or quarterly), TIME (year), Value (in national currency units where applicable) and Flag Codes which denote the accuracy of a given value’s measurement. To access specific information from this dataset such as a values or locations associated with particular indicators such as “Gross Debt of General Government” one can use the filtering options available to select specific regions that they want to compare against each other. The results page will show multiple graphs where users can export individual numbers or view/download all datasets related to particular subgroups based on their selection criteria. Additionally users can choose to generate tables if they wish to compare numerical results rather than graphical ones since each entry shows details up until 2018 along with values published over various years when available.
It is important that users take note of any flag codes pertaining these datasets as this indicates why data may have been missing from specific points series entries under certain conditions thus providing additional context that should be good practice for comparative analysis purposes between different countries' results for instance. Finally we recommend that advanced users download directly and read through raw csv files provided at the link contained within this description so as to better understand how variables were recorded upon original entry though always keep current standards corresponding filtering column filter selection into account prior making any graphical comparison output interpretations without confirmation further details from more authoritative sources including national treasuries departments themselves if possible beforehand nature needed since all fields contained were originally filled out just once during collection process associated source year indicated at time entry only after has been approved management group curation following validation our accuracy protocols chosen site lastly once all said taken care creating memorable finalized looks report visualizations done easier all via software programs compatible kaggle now let's get started analyzing!
- Computing gender-disaggregated government debt levels to reveal systemic imbalances such as gender inequality in government spending.
- Estimating the amount of money spent on infrastructure projects by specific OECD countries over a certain period of time.
- Modeling and predicting future macroeconomic trends in terms of general government debt, for use in investment and financial planning activities
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - **Give appropriate cr...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Product Exchange/Bartering Dataset encompasses data from peer-to-peer trade activities on various recommendation platforms like Tradesy, Ratebeer, and Gameswap. This dataset is a rich resource for examining user behaviors, preferences, and the dynamics of product exchanges in online bartering communities.
Basic Statistics: - Tradesy: - Users: 128,152 - Transactions: 68,543 - Ratebeer: - Users: 2,215 - Transactions: 125,665 - Gameswap: - Users: 9,888 - Transactions: 3,470
Metadata: - Peer-to-Peer Trades: Transaction data showcasing the exchange of items between users. - "Have" and "Want" Lists: Lists indicating the products users have and the products they wish to acquire. - Image Data (Tradesy): Image data related to the products being traded on the Tradesy platform.
Example (Tradesy):
Each entry in the dataset provides information about the user's activity on the platform, including lists of bought, selling, want, and sold items.
json
{
'lists':
{
'bought': ['466', '459', '457', '449'],
'selling': [],
'want': [],
'sold': ['104', '103', '102']
},
'uid': '2'
}
Download Links: - Tradesy: Download Link (3.8mb) - Ratebeer and Gameswap: Project Page for Download
Citation: If you use this dataset, please cite the following papers: 1. Title: Bartering books to beers: A recommender system for exchange platforms Authors: Jérémie Rappaz, Maria-Luiza Vladarean, Julian McAuley, Michele Catasta Published in: WSDM, 2017 Link to paper
Use Cases: 1. Recommender System Development: Building recommendation systems to suggest products users might want to barter based on their past interactions and preferences. 2. User Behavior Analysis: Analyzing user behavior in peer-to-peer trading platforms to understand the dynamics of online bartering and exchange. 3. Product Matching: Developing algorithms to match products for exchange, enhancing the efficiency of the bartering process. 4. Community Detection: Identifying communities of users with similar trading behaviors or preferences, which could lead to more efficient bartering networks. 5. Marketplace Design: Improving the design of online marketplaces and trading platforms to facilitate better user experiences and more successful trades. 6. Supply and Demand Analysis: Analyzing supply and demand trends for various products within these online bartering communities. 7. Visual Analysis: In the case of Tradesy, leveraging image data to perform visual analysis of products, which can be used for visual-based recommendation or product categorization. 8. Sentiment Analysis: Extracting and analyzing user-generated content to gauge sentiment towards particular products or trading experiences. 9. Economic Research: Studying online bartering as a form of economic activity, and understanding how digital platforms are shaping modern trade practices. 10. Fraud Detection: Identifying anomalous behaviors or potential fraud in trading activities.
The Product Exchange/Bartering Dataset is instrumental for researchers and practitioners aiming to delve into the realm of online bartering systems, understand user exchange behaviors, and develop recommender systems to facilitate peer-to-peer exchanges.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains detailed information about a wide range of books available for purchase on an online retailer's website. It includes data such as book titles, authors, categories, prices, stock status, number of copies left, book length in pages, edition details, publication information, and customer engagement metrics like wished users counts and discount offers. This dataset is ideal for data analysis projects focusing on book sales trends, customer preferences, and market insights within the online retail book industry. Whether you're exploring pricing strategies, customer behavior, or genre popularity, this dataset provides a rich resource for data-driven exploration and analysis in the domain of online book retailing. Content:
Book Title: Title of the book.
Author: Author(s) of the book.
Category: Category or genre of the book.
Price (TK): Price of the book in TK (local currency).
Stock Status: Availability status of the book (In Stock/Out of Stock).
Copies Left: Number of copies currently available.
Book Length (Pages): Number of pages in the book.
Edition: Edition details of the book.
Publication: Publisher or publication details.
Wished Users: Number of users who have added this book to their wish list.
Discount Offer: Any available discount or promotional offer on the book.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17551034%2F8d16bae9e2eb4046322d2423daaaad97%2F949302f2-fb09-4fbf-95d6-338388c4a753.png?generation=1719996201879469&alt=media" alt="">
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Customers freely express their thoughts in today's digital environment through online evaluations, which have a significant impact on how people perceive products and how they make judgments about what to buy. Companies use this input to learn what consumers enjoy, do not like, and anticipate from their goods and services. Data analysts, researchers, and machine learning enthusiasts who wish to investigate consumer behavior and sentiment trends might benefit from this dataset, which has been assembled to offer insightful information on customer experiences and sentiments.
Along with significant variables including review text, ratings, sentiment labels, and other fields that represent the customer experience, this dataset includes customer reviews. A customer's feedback on a product is represented by each record, which indicates whether the client had a favorable, negative, or neutral experience. Sentiment analysis, text categorization, EDA, data visualization, and predictive model construction can all be done with this dataset. It provides an organized perspective of actual customer feedback to assist in identifying trends in customer preferences, product performance, and satisfaction.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jeffrey Mvutu Mabilama [source]
This dataset brings you closer than ever to the reality of top-selling products and their performance in e-commerce platforms. It gives you detailed lists of each product's features, ratings, sales, reviews and other metrics so that you can understand what makes a successful summer product on Wish. With this data at hand, you have access to not only a curated list of top summer products but also to the power of analytics for boosting your business operations.
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information about summer product listings, ratings, and sales performance data on the Wish e-commerce platform. Using this information you will be able to understand how well certain products sell, the average price of products in the summer season and many more interesting insights that can be gained from this dataset.
- Estimating the optimal pricing strategy for a product based on its ratings, merchant ratings count, mean discount and other metrics. This would help businesses to determine which pricing strategy would produce the most profits while still keeping customers interested in their products.
- Analyzing the performance of seasonal summer products by studying correlations between them, and their ratings, units sold and prices etc., allowing businesses to identify trends more accurately and improve sales strategies accordingly.
- Tracking sellers’ fame across different countries through analysis of customer reviews for each product listed by them in order to understand better how location affects sales performance as well as evaluate customer satisfaction with particular sellers regarding shipping times or quality of products supplied from aforesaid seller’s inventory
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: summer-products-with-rating-and-performance_2020-08.csv | Column name | Description | |:---------------------------------|:-----------------------------------------------------------------------| | title | The title of the product. (String) | | title_orig | The original title of the product. (String) | | price | The price of the product. (Float) | | retail_price | The original retail price of the product. (Float) | | currency_buyer | The currency of the buyer. (String) | | units_sold | The number of units sold. (Integer) | | uses_ad_boosts | A flag indicating if the product has been boosted using ads. (Boolean) | | rating | The rating of the product. (Float) | | rating_count | The total number of ratings for the product. (Integer) | | rating_five_count | The number of five star ratings for the product. (Integer) | | rating_four_count | The number of four star ratings for the product. (Integer) | | rating_three_count | The number of three star ratings for the product. (Integer) | | rating_two_count | The number of two star ratings for the product. (Integer) | | rating_one_count | The number of one star ratings for the product. (Integer) | | badges_count | The number of badges associated with the product. (Integer) | | badge_local_product | A flag indicating if the product is a local product. (Boolean) | | badge_product_quality | A flag indicating if the product has a quality badge. (Boolean) | | badge_fast_shipping | A flag in...