8 datasets found
  1. Synthetic User Event Log (Object Datetime Format)

    • kaggle.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jayanta Nath (2025). Synthetic User Event Log (Object Datetime Format) [Dataset]. https://www.kaggle.com/datasets/jayaantanaath/synthetic-user-event-log-object-datetime-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jayanta Nath
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This synthetic dataset contains 100 user event logs with a focus on datetime handling in Python. The timestamp column is intentionally stored as an object (string) to help learners practice:

    Converting strings to datetime objects using pd.to_datetime

    Extracting features like hour, day, weekday, etc.

    Handling datetime formatting and manipulation

    Performing time-based grouping and filtering

    🧪 Ideal for:

    Python beginners

    Pandas learners

    Data wrangling practice

    Building beginner Kaggle notebooks

    💡 Tip: Start by converting the timestamp to datetime format and see what insights you can extract from user behavior!

  2. m

    COVID-19 Scholarly Production Dataset

    • data.mendeley.com
    Updated Jul 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gisliany Alves (2020). COVID-19 Scholarly Production Dataset [Dataset]. http://doi.org/10.17632/kx7wwc8dzp.5
    Explore at:
    Dataset updated
    Jul 7, 2020
    Authors
    Gisliany Alves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COVID-2019 has been recognized as a global threat, and several studies are being conducted in order to contribute to the fight and prevention of this pandemic. This work presents a scholarly production dataset focused on COVID-19, providing an overview of scientific research activities, making it possible to identify countries, scientists and research groups most active in this task force to combat the coronavirus disease. The dataset is composed of 40,212 records of articles' metadata collected from Scopus, PubMed, arXiv and bioRxiv databases from January 2019 to July 2020. Those data were extracted by using the techniques of Python Web Scraping and preprocessed with Pandas Data Wrangling.

  3. d

    Data from: SBIR - STTR Data and Code for Collecting Wrangling and Using It

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allard, Grant (2023). SBIR - STTR Data and Code for Collecting Wrangling and Using It [Dataset]. http://doi.org/10.7910/DVN/CKTAZX
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Allard, Grant
    Description

    Data set consisting of data joined for analyzing the SBIR/STTR program. Data consists of individual awards and agency-level observations. The R and python code required for pulling, cleaning, and creating useful data sets has been included. Allard_Get and Clean Data.R This file provides the code for getting, cleaning, and joining the numerous data sets that this project combined. This code is written in the R language and can be used in any R environment running R 3.5.1 or higher. If the other files in this Dataverse are downloaded to the working directory, then this Rcode will be able to replicate the original study without needing the user to update any file paths. Allard SBIR STTR WebScraper.py This is the code I deployed to multiple Amazon EC2 instances to scrape data o each individual award in my data set, including the contact info and DUNS data. Allard_Analysis_APPAM SBIR project Forthcoming Allard_Spatial Analysis Forthcoming Awards_SBIR_df.Rdata This unique data set consists of 89,330 observations spanning the years 1983 - 2018 and accounting for all eleven SBIR/STTR agencies. This data set consists of data collected from the Small Business Administration's Awards API and also unique data collected through web scraping by the author. Budget_SBIR_df.Rdata 246 observations for 20 agencies across 25 years of their budget-performance in the SBIR/STTR program. Data was collected from the Small Business Administration using the Annual Reports Dashboard, the Awards API, and an author-designed web crawler of the websites of awards. Solicit_SBIR-df.Rdata This data consists of observations of solicitations published by agencies for the SBIR program. This data was collected from the SBA Solicitations API. Primary Sources Small Business Administration. “Annual Reports Dashboard,” 2018. https://www.sbir.gov/awards/annual-reports. Small Business Administration. “SBIR Awards Data,” 2018. https://www.sbir.gov/api. Small Business Administration. “SBIR Solicit Data,” 2018. https://www.sbir.gov/api.

  4. PythonLibraries|WheelFiles

    • kaggle.com
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Ramakrishnan (2024). PythonLibraries|WheelFiles [Dataset]. https://www.kaggle.com/datasets/ravi20076/pythonlibrarieswheelfiles/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravi Ramakrishnan
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Hello all,
    This dataset is my humble attempt to allow myself and others to upgrade essential python packages to their latest versions. This dataset contains the .whl files of the below packages to be used across general kernels and especially in internet-off code challenges-

    PackageVersionFunctionality
    AutoGluon1.0.0AutoML models
    Catboost1.2.2
    1.2.3
    ML models
    Iterative-Stratification0.1.7Iterative stratification for multi-label classifiers
    Joblib1.3.2File dumping and retrieval
    LAMA0.3.8b1AutoML models
    LightGBM4.3.0
    4.2.0
    4.1.0
    ML models
    MAPIE0.8.2Quantile regression
    Numpy1.26.3Data wrangling
    Pandas2.1.4Data wrangling
    Polars0.20.3
    0.20.4
    Data wrangling
    PyTorch2.0.1Neural networks
    PyTorch-TabNet4.1.0Neural networks
    PyTorch-Forecast0.7.0Neural networks
    Pygwalker0.3.20Data wrangling and visualization
    Scikit-learn1.3.2
    1.4.0
    ML Models/ Pipelines/ Data wrangling
    Scipy1.11.4Data wrangling/ Statistics
    TabPFN10.1.9ML models
    Torch-Frame1.7.5Neural Networks
    TorchVision0.15.2Neural Networks
    XGBoost2.0.2
    2.0.1
    2.0.3
    ML models


    I plan to update this dataset with more libraries and later versions as they get upgraded in due course. I hope these wheel files are useful to one and all.

    Recent updates based on user feedback-

    1. lightgbm 4.1.0 and 4.3.0
    2. Older XGBoost versions (2.0.1 and 2.0.2)
    3. Torch-Frame, TabNet, PyTorch-Forecasting, TorchVision
    4. MAPIE
    5. LAMA 0.3.8b1
    6. Iterative-Stratification
    7. Catboost 1.2.3

    Best regards and happy learning and coding!

  5. h

    ds-coder-instruct-v1

    • huggingface.co
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edvard Avagyan (2024). ds-coder-instruct-v1 [Dataset]. https://huggingface.co/datasets/ed001/ds-coder-instruct-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2024
    Authors
    Edvard Avagyan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for DS Coder Instruct Dataset

    DS Coder is a dataset for instruction fine tuning of language models. It is a specialized dataset focusing only on data science (eg. plotting, data wrangling, machine learnig models, deep learning, and numerical computations). The dataset contains code examples both in R and Python. The goal of this dataset is to enable creation of small-scale, specialized language model assistants for data science projects.

      Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/ed001/ds-coder-instruct-v1.
    
  6. Netflix

    • kaggle.com
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna@82 (2025). Netflix [Dataset]. https://www.kaggle.com/datasets/prasanna82/netflix
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Kaggle
    Authors
    Prasanna@82
    Description

    Netflix Dataset Exploration and Visualization

    This project involves an in-depth analysis of the Netflix dataset to uncover key trends and patterns in the streaming platform’s content offerings. Using Python libraries such as Pandas, NumPy, and Matplotlib, this notebook visualizes and interprets critical insights from the data.

    Objectives:

    Analyze the distribution of content types (Movies vs. TV Shows)

    Identify the most prolific countries producing Netflix content

    Study the ratings and duration of shows

    Handle missing values using techniques like interpolation, forward-fill, and custom replacements

    Enhance readability with bar charts, horizontal plots, and annotated visuals

    Key Visualizations:

    Bar charts for type distribution and country-wise contributions

    Handling missing data in rating, duration, and date_added

    Annotated plots showing values for clarity

    Tools Used:

    Python 3

    Pandas for data wrangling

    Matplotlib for visualizations

    Jupyter Notebook for hands-on analysis

    Outcome: This project provides a clear view of Netflix's content library, helping data enthusiasts and beginners understand how to process, clean, and visualize real-world datasets effectively.

    Feel free to fork, adapt, and extend the work.

  7. m

    Bee Swarm Analysis

    • data.mendeley.com
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kosta Manser (2022). Bee Swarm Analysis [Dataset]. http://doi.org/10.17632/5bmscj7jf7.1
    Explore at:
    Dataset updated
    Jul 4, 2022
    Authors
    Kosta Manser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collected by E. Hunting et al. comprising video footage and electric field recordings from a video camera and field mill respectively. Data wrangling was done by K. Manser, the author of the python script.

  8. Malaysia Covid-19 Dataset

    • kaggle.com
    Updated Jul 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TanKY (2021). Malaysia Covid-19 Dataset [Dataset]. https://www.kaggle.com/yeanzc/malaysia-covid19-dataset/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TanKY
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Malaysia
    Description

    A free, publicly available Malaysia Covid-19 dataset.

    Data Descriptions

    28 variables. Include:

    New case New case (7 day rolling average) Recovered Active case Local cases Imported case ICU Death Cumulative deaths

    People tested Cumulative people tested Positivity rate Positivity rate (7 day rolling average)

    Data Sources

    Column 1 to 22 are Twitter data, which the Tweets are retrieved from Health DG @DGHisham timeline with Twitter API. A typical covid situation update Tweet is written in a relatively fixed format. Data wrangling are done in Python/Pandas, numerical values extracted with Regular Expression (RegEx). Missing data are added manually from Desk of DG (kpkesihatan).

    Column 23 ['remark'] is my own written remark regarding the Tweet status/content.

    Column 24 ['Cumulative people tested'] data is transcribed from an image on MOH COVID-19 website. Specifically, the first image under TABURAN KES section in each Situasi Terkini daily webpage of http://covid-19.moh.gov.my/terkini. If missing, the image from CPRC KKM Telegram or KKM Facebook Live video is used. Data in this column, dated from 1 March 2020 to 11 Feb 2021, are from Our World in Data, their data collection method as stated here.

    Why does this dataset exist?

    MOH does not publish any covid data in csv/excel format as of today, they provide the data as is, along with infographics that are hardly informative. In an undisclosed email, MOH doesn't seem to understand my request for them to release the covid public health data for anyone to download and do their analysis if they do wish.

    To be updated periodically

    A simple visualization dashboard is now published on Tableau Public. It's is updated daily. Do check it out! More charts to be added in the near future

    Inspiration

    Create better visualizations to help fellow Malaysians understand the Covid-19 situation. Empower the data science community.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jayanta Nath (2025). Synthetic User Event Log (Object Datetime Format) [Dataset]. https://www.kaggle.com/datasets/jayaantanaath/synthetic-user-event-log-object-datetime-format
Organization logo

Synthetic User Event Log (Object Datetime Format)

A 100-row dataset for practicing datetime and feature extraction in Python.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jayanta Nath
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This synthetic dataset contains 100 user event logs with a focus on datetime handling in Python. The timestamp column is intentionally stored as an object (string) to help learners practice:

Converting strings to datetime objects using pd.to_datetime

Extracting features like hour, day, weekday, etc.

Handling datetime formatting and manipulation

Performing time-based grouping and filtering

🧪 Ideal for:

Python beginners

Pandas learners

Data wrangling practice

Building beginner Kaggle notebooks

💡 Tip: Start by converting the timestamp to datetime format and see what insights you can extract from user behavior!

Search
Clear search
Close search
Google apps
Main menu