100+ datasets found
  1. Sample Leads Dataset

    • kaggle.com
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
    Explore at:
    zip(22640 bytes)Available download formats
    Dataset updated
    Jun 24, 2022
    Authors
    ThatSean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.

  2. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  3. Sample Sales Data (5 million transactions)

    • kaggle.com
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales
    Explore at:
    zip(201186399 bytes)Available download formats
    Dataset updated
    Jul 8, 2021
    Authors
    Chris Chua
    Description

    Dataset

    This dataset was created by Chris Chua

    Contents

  4. s

    Snowplow Modeled Customer Data Sample

    • snowplow.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Snowplow Analytics, Snowplow Modeled Customer Data Sample [Dataset]. https://snowplow.io/explore-snowplow-data-part-2
    Explore at:
    Dataset authored and provided by
    Snowplow Analytics
    Time period covered
    Apr 1, 2020 - Apr 3, 2020
    Variables measured
    user_id, mkt_source, page_views, session_id, conversions, geo_country, device_class, mkt_campaign, session_length, time_engaged_in_s
    Description

    Example of modeled customer behavioral data showing user sessions, engagement metrics, and conversion data across multiple platforms and devices

  5. E-Commerce Sales

    • kaggle.com
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prince Rajak (2025). E-Commerce Sales [Dataset]. https://www.kaggle.com/datasets/prince7489/e-commerce-sales
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    Kaggle
    Authors
    Prince Rajak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a synthetic but realistic sample of e-commerce sales for an online store, covering the period from 2024 to 2025. It includes details about orders, customers, products, regions, pricing, discounts, sales, profit, and payment modes.

    It is designed for data analysis, visualization, and machine learning projects. Beginners and advanced users can use this dataset to practice:

    Exploratory Data Analysis (EDA)

    Sales trend analysis

    Profit margin and discount analysis

    Customer segmentation

    Predictive modeling (e.g., sales or profit prediction)

  6. Dataset #1: Cross-sectional survey data

    • figshare.com
    txt
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    N.B. This is not real data. Only here for an example for project templates.

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  7. h

    multilingual-data-sample

    • huggingface.co
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OrbitalsAI LTD (2025). multilingual-data-sample [Dataset]. https://huggingface.co/datasets/orbitalsai/multilingual-data-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    OrbitalsAI LTD
    Description

    orbitalsai/multilingual-data-sample dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  9. h

    Data from: sample-dataset

    • huggingface.co
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reddy (2025). sample-dataset [Dataset]. https://huggingface.co/datasets/Amitej/sample-dataset
    Explore at:
    Dataset updated
    Jan 20, 2025
    Authors
    Reddy
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Amitej/sample-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. Feature Engineering Data

    • kaggle.com
    zip
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mat Leonard (2019). Feature Engineering Data [Dataset]. https://www.kaggle.com/datasets/matleonard/feature-engineering-data
    Explore at:
    zip(205019058 bytes)Available download formats
    Dataset updated
    Jul 23, 2019
    Authors
    Mat Leonard
    Description

    This dataset is a sample from the TalkingData AdTracking competition. I kept all the positive examples (where is_attributed == 1), while discarding 99% of the negative samples. The sample has roughly 20% positive examples.

    For this competition, your objective was to predict whether a user will download an app after clicking a mobile app advertisement.

    File descriptions

    train_sample.csv - Sampled data

    Data fields

    Each row of the training data contains a click record, with the following features.

    • ip: ip address of click.
    • app: app id for marketing.
    • device: device type id of user mobile phone (e.g., iphone 6 plus, iphone 7, huawei mate 7, etc.)
    • os: os version id of user mobile phone
    • channel: channel id of mobile ad publisher
    • click_time: timestamp of click (UTC)
    • attributed_time: if user download the app for after clicking an ad, this is the time of the app download
    • is_attributed: the target that is to be predicted, indicating the app was downloaded

    Note that ip, app, device, os, and channel are encoded.

    I'm also including Parquet files with various features for use within the course.

  11. w

    Websites using Sample Data

    • webtechsurvey.com
    csv
    Updated Oct 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Sample Data [Dataset]. https://webtechsurvey.com/technology/sample-data
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Sample Data technology, compiled through global website indexing conducted by WebTechSurvey.

  12. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  13. Electrical Product Sample Sales Data

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Mutlu (2022). Electrical Product Sample Sales Data [Dataset]. https://www.kaggle.com/datasets/muratmutlubi/electrical-product-sample-sales-data
    Explore at:
    zip(43201106 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Murat Mutlu
    Description

    Dataset

    This dataset was created by Murat Mutlu

    Released under Data files © Original Authors

    Contents

  14. m

    Indeterminate Likert Scale - Sample Dataset - Customer Feedback of...

    • data.mendeley.com
    Updated Dec 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilanthenral Kandasamy (2018). Indeterminate Likert Scale - Sample Dataset - Customer Feedback of Restaurant [Dataset]. http://doi.org/10.17632/ywjxpyw95w.1
    Explore at:
    Dataset updated
    Dec 23, 2018
    Authors
    Ilanthenral Kandasamy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Research Hypothesis: Using the concept of Neutrosophy to deal Indeterminacy in Feedback

    Data: Feedback given by customers of a restaurant. Questionnaire based on six factors, i.e., Quality of Food, Service, Hygiene, Value for money, Ambiance, Overall Experience. Each question (based on the factor) has five membership values as follows: , Positive, Positive Indeterminate, Indeterminate, Negative Indeterminate and Negative.

  15. N

    DOB sample data

    • data.cityofnewyork.us
    • data.wu.ac.at
    csv, xlsx, xml
    Updated Dec 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Buildings (DOB) (2025). DOB sample data [Dataset]. https://data.cityofnewyork.us/Housing-Development/DOB-sample-data/bkyx-e5n5
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    Department of Buildings (DOB)
    Description

    A list of complaints received and associated data. Prior monthly reports are archived at DOB and are not available on NYC Open Data.

  16. Gen Z data tables – total sample; and Gen Z data tables – Gen Z sample -...

    • ckan.publishing.service.gov.uk
    Updated May 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2020). Gen Z data tables – total sample; and Gen Z data tables – Gen Z sample - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/gen-z-data-tables-total-sample-and-gen-z-data-tables-gen-z-sample
    Explore at:
    Dataset updated
    May 18, 2020
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    These two datasets provide the responses to a survey on food including what influences decisions on what people choose to eat, and what is important to people when selecting food for example price, animal welfare, origin of food. Knowledge of the food system Use of technology when purchasing food and key concerns about food. The total sample includes all age groups 16+ and has a sample size of 2475. The Gen Z sample is of generation Z only 16- 25 year olds and has a sample size of 619.

  17. Sample data files for Python Course

    • figshare.com
    txt
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 4, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Peter Verhaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data set used in an introductory course on Programming in Python

  18. Z

    Cloud-based User Entity Behavior Analytics Log Data Set

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landauer, Max; Skopik, Florian; Höld, Georg; Wurzenberger, Markus (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7119952
    Explore at:
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    AIT Austrian Institute of Technology
    Authors
    Landauer, Max; Skopik, Florian; Höld, Georg; Wurzenberger, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1]. The logs are provided in JSON format with the following attributes in the first level:

    id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1. time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z. uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer. uidType: Specifier for uid, which is either the user name or IP address for logged out users. type: The action carried out by the user, e.g., file_accessed. params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary. isLocalIP: Optional flag for event origin, which is either internal (true) or external (false). role: Optional user role: consulting, administration, management, sales, technical, or external. location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc. In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:

    {"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}

    {"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"} The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers. The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo. Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set. If you use the dataset, please cite the following publication: [1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]

  19. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  20. h

    med-domain-data-sample

    • huggingface.co
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TextCleanLM (2025). med-domain-data-sample [Dataset]. https://huggingface.co/datasets/textcleanlm/med-domain-data-sample
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset authored and provided by
    TextCleanLM
    Description

    textcleanlm/med-domain-data-sample dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
Organization logo

Sample Leads Dataset

A basic simulated dataset of marketing/sales leads with lead sources

Explore at:
zip(22640 bytes)Available download formats
Dataset updated
Jun 24, 2022
Authors
ThatSean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.

Search
Clear search
Close search
Google apps
Main menu