Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.
## Structure
```
Projects/
Java_projects/
eclipse.zip
guava.zip
guice.zip
hadoop.zip
spark.zip
vaadin.zip
Pharo_projects/
images/
GToolkit.zip
Moose.zip
PetitParser.zip
Pillar.zip
PolyMath.zip
Roassal2.zip
Seaside.zip
vm/
70-x64/Pharo
Scripts/
ClassCommentExtraction.st
SampleSelectionScript.st
Python_projects/
django.zip
ipython.zip
Mailpile.zip
pandas.zip
pipenv.zip
pytorch.zip
requests.zip
```
## Contents of the Replication Package
---
**Projects/** contains the raw projects of each language that are used to analyze class comments.
- **Java_projects/**
- `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub [Eclipse](https://github.com/eclipse).
- `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is available on GitHub [Guava](https://github.com/google/guava).
- `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is available on GitHub [Guice](https://github.com/google/guice)
- `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Hadoop](https://github.com/apache/hadoop)
- `spark.zip` - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Spark](https://github.com/apache/spark)
- `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub [Vaadin](https://github.com/vaadin/framework)
- **Pharo_projects/**
- **images/** -
- `GToolkit.zip` - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Moose.zip` - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `PetitParser.zip` - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Pillar.zip` - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `PolyMath.zip` - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Roassal2.zip` - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Seaside.zip` - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- **vm/** -
- **70-x64/Pharo** - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the `images/` folder. The user can run the vm on macOS and select any of the Pharo image.
- **Scripts/** - It contains the sample Smalltalk scripts to extract class comments from various projects.
- `ClassCommentExtraction.st` - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.
- `SampleSelectionScript.st` - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.
- **Python_projects/**
- `django.zip` - Django project downloaded from the GitHub. More detail about the project is available on GitHub [Django](https://github.com/django)
- `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on [IPython](https://github.com/ipython/ipython)
- `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on [Mailpile](https://github.com/mailpile/Mailpile)
- `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on [pandas](https://github.com/pandas-dev/pandas)
- `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on [Pipenv](https://github.com/pypa/pipenv)
- `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on [PyTorch](https://github.com/pytorch/pytorch)
- `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on [Requests](https://github.com/psf/requests/)
https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html
The compressed package (Study_code.zip) contains the code files implemented by an under review paper ("What you see is what you get: Delineating urban jobs-housing spatial distribution at a parcel scale by using street view imagery based on deep learning technique").The compressed package (input_land_parcel_with_attributes.zip) is the sampled mixed "jobs-housing" attributes data of the study area with multiple probability attributes (Only working, Only living, working and living) at the land parcel scale.The compressed package (input_street_view_images.zip) is the surrounding street view data near sampled land parcels (input_land_parcel_with_attributes.zip) with the pixel size of 240*160 obtained from Tencent map (https://map.qq.com/).The compressed package (output_results.zip) contains the result vector files (Jobs-housing pattern distribution and error distribution) and file description (Readme.txt).This project uses some Python open source libraries (Numpy, Pandas, Selenium, Gdal, Pytorch and sklearn). This project complies with the GPL license.Numpy (https://numpy.org/) is an open source numerical calculation tool developed by Travis Oliphant. Used in this project for matrix operation. This library complies with the BSD license.Pandas (https://pandas.pydata.org/) is an open source library, providing high-performance, easy-to-use data structures and data analysis tools. This library complies with the BSD license.Selenium(https://www.selenium.dev/) is a suite of tools for automating web browsers.Used in this project for getting street view images.This library complies with the BSD license.Gdal(https://gdal.org/) is a translator library for raster and vector geospatial data formats.Used in this project for processing geospatial data.This library complies with the BSD license.Pytorch(https://pytorch.org/) is an open source machine learning framework that accelerates the path from research prototyping to production deployment.Used in this project for deep learning.This library complies with the BSD license.sklearn(https://scikit-learn.org/) is an open source machine learning tool for python.Used in this project for comparing precision metrics.This library complies with the BSD license.
The bank.csv dataset describes about a phone call between customer and customer care staffs who are working for Portuguese banking institution. The dataset is about, whether the customer will get the scheme or product such as bank term deposit. Maximum the data will have βyesβ or βnoβ type data.
The main goal is to predict if clients will subscribe to a term deposit or not.
Bank Client Data: 1 - age: (numeric) 2 - job: type of job (categorical: admin., blue-collar, entrepreneur, housemaid, management, retired, self-employed, services, student, technician, unemployed, unknown) 3 - marital: marital status (categorical: divorced, married, single, unknown; note: divorced means either divorced or widowed) 4 - education: (categorical: basic.4y, basic.6y, basic.9y, high.school, illiterate, professional.course, university.degree, unknown) 5 - default: has credit in default? (categorical: no, yes, unknown) 6 - housing: has housing loan? (categorical: no, yes, unknown) 7 - loan: has personal loan? (categorical: no, yes, unknown)
Related with the Last Contact of the Current Campaign: 8 - contact: contact communication type (categorical: cellular, telephone) 9 - month: last contact month of year (categorical: jan, feb, mar, ..., nov, dec) 10 - day_of_week: last contact day of the week (categorical: mon, tue, wed, thu, fri) 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
Other Attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: failure, nonexistent, success)
#Social and Economic Context Attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)
Output Variable (Desired Target): 21 - y (deposit): - has the client subscribed a term deposit? (binary: yes, no) -> changed column title from '***y***' to '***deposit***'
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Olympics Data Analysis project explores historical Olympic data using Exploratory Data Analysis (EDA) techniques. By leveraging Python libraries such as pandas, seaborn, and matplotlib, the project uncovers patterns in medal distribution, athlete demographics, and country-wise performance.
Key findings reveal that most medalists are aged between 20-30 years, with USA, China, and Russia leading in total medals. Over time, female participation has increased significantly, reflecting improved gender equality in sports. Additionally, athlete characteristics like height and weight play a crucial role in certain sports, such as basketball (favoring taller players) and gymnastics (favoring younger athletes).
The project includes interactive visualizations such as heatmaps, medal trends, and gender-wise participation charts to provide a comprehensive understanding of Olympic history and trends. The insights can help sports analysts, researchers, and enthusiasts better understand performance patterns in the Olympics.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a collection of user reviews and ratings for dating applications, primarily sourced from the Google Play Store for the Indian region between 2017 and 2022. It offers valuable insights into user sentiment, evolving trends, and common feedback regarding dating apps. The data is particularly useful for practising Natural Language Processing (NLP) tasks such as sentiment analysis, topic modelling, and identifying user concerns.
The dataset is typically provided in a CSV file format. It contains a substantial number of records, estimated to be around 527,000 individual reviews. This makes it suitable for large-scale data analysis and machine learning projects. The dataset structure is tabular, with clearly defined columns for review content, metadata, and user feedback. Specific row/record counts are not exact but are indicated by the extensive range of index labels.
This dataset is ideally suited for a variety of analytical and machine learning applications: * Analysing trends in dating app usage and perception over the years. * Determining which dating applications receive more favourable responses and if this consistency has changed over time. * Identifying common issues reported by users who give low ratings (below 3/5). * Investigating the correlation between user enthusiasm and their app ratings. * Performing sentiment analysis on review texts to gauge overall user sentiment. * Developing Natural Language Processing (NLP) models for text classification, entity recognition, or summarisation. * Examining the perceived usefulness of top-rated reviews. * Understanding user behaviour and preferences across different dating apps.
The dataset primarily covers user reviews from the Google Play Store, specifically for the Indian country region ('in'), despite being titled as "all regions" in some contexts. The data spans a time range from 2017 to 2022, offering a multi-year perspective on dating app trends and user feedback. There are no specific demographic details for the reviewers themselves beyond their reviews and ratings.
CCO
This dataset is suitable for: * Data Scientists and Analysts: For conducting deep dives into user sentiment, trend analysis, and predictive modelling. * NLP Practitioners and Researchers: As a practical dataset for training and evaluating natural language processing models, especially for text classification and sentiment analysis tasks. * App Developers and Product Managers: To understand user feedback, identify areas for improvement in their own or competing dating applications, and inform product development strategies. * Market Researchers: To gain insights into the consumer behaviour and preferences within the online dating market. * Students and Beginners: It is tagged as 'Beginner' friendly, making it a good resource for those new to data analysis or NLP projects.
Original Data Source: Dating Apps Reviews 2017-2022 (all regions)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a udacity data science course project. In this project is did an analysis about open data of Bay Area Bake Share. Ford GoBike is the Bay Area's new bike share system, with thousands of public bikes for use across San Francisco, East Bay and San Jose. Theirs bike share is designed with convenience in mind; itβs a fun and affordable way to get around town.
In this project was did be many analysis that they ask me. And in the final of project, I did two simple analysis. In my first analysis, I present a rainy day influence on the trips. The analysis show that a rainy day influence in a trips reduction. In my second analysis I show trips quantity per week day in San Francisco. I show too trips quantity per subscriber type. And last I present trips quantity for each subscriber type per week day. It was possible to observe that trips are lower in weekends and bigger in weekdays. The data show us too that in the weekdays, most of the trips are made by annual subscribers and, in the weekend, a most are made by customers. I did create some functions that can help somebody, and it's possible check many examples about python and pandas.
This was my first project in data science. I continue to study and learn and hope to improve more and more.
Thank's.
Our main question is to find out what San Francisco's road safety problems are and what the city is doing to fix them. Our first approach is to see if there is any correlation between specific populations by census tract and the collision rates. If the approach fails, the alternative is to look at how the collision rates are correlated with the public safety projects. By looking at how the projects have impacted road safety, we can assess whether the city is on the right track with the projects, or if the projects are a waste of time and money. Our original proposal was to analyze traffic in San Francisco. That was when we assumed we were able to use the data from Uber Movement. Due to certain constraints that will be mentioned in the Data Sources section, we were unable to perform such analysis. Hence, we switched to analyzing road safety instead.Notable Modules Used: Python: pandas, geopandas, shapely, matplotlib, scipy ArcGIS: aggregate_points
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a realistic and structured pizza sales dataset covering the time span from **2024 to 2025. ** Whether you're a beginner in data science, a student working on a machine learning project, or an experienced analyst looking to test out time series forecasting and dashboard building, this dataset is for you.
π Whatβs Inside? The dataset contains rich details from a pizza business including:
β Order Dates & Times β Pizza Names & Categories (Veg, Non-Veg, Classic, Gourmet, etc.) β Sizes (Small, Medium, Large, XL) β Prices β Order Quantities β Customer Preferences & Trends
It is neatly organized in Excel format and easy to use with tools like Python (Pandas), Power BI, Excel, or Tableau.
π‘** Why Use This Dataset?** This dataset is ideal for:
π Sales Analysis & Reporting π§ Machine Learning Models (demand forecasting, recommendations) π Time Series Forecasting π Data Visualization Projects π½οΈ Customer Behavior Analysis π Market Basket Analysis π¦ Inventory Management Simulations
π§ Perfect For: Data Science Beginners & Learners BI Developers & Dashboard Designers MBA Students (Marketing, Retail, Operations) Hackathons & Case Study Competitions
pizza, sales data, excel dataset, retail analysis, data visualization, business intelligence, forecasting, time series, customer insights, machine learning, pandas, beginner friendly
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.
## Structure
```
Projects/
Java_projects/
eclipse.zip
guava.zip
guice.zip
hadoop.zip
spark.zip
vaadin.zip
Pharo_projects/
images/
GToolkit.zip
Moose.zip
PetitParser.zip
Pillar.zip
PolyMath.zip
Roassal2.zip
Seaside.zip
vm/
70-x64/Pharo
Scripts/
ClassCommentExtraction.st
SampleSelectionScript.st
Python_projects/
django.zip
ipython.zip
Mailpile.zip
pandas.zip
pipenv.zip
pytorch.zip
requests.zip
```
## Contents of the Replication Package
---
**Projects/** contains the raw projects of each language that are used to analyze class comments.
- **Java_projects/**
- `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub [Eclipse](https://github.com/eclipse).
- `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is available on GitHub [Guava](https://github.com/google/guava).
- `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is available on GitHub [Guice](https://github.com/google/guice)
- `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Hadoop](https://github.com/apache/hadoop)
- `spark.zip` - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Spark](https://github.com/apache/spark)
- `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub [Vaadin](https://github.com/vaadin/framework)
- **Pharo_projects/**
- **images/** -
- `GToolkit.zip` - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Moose.zip` - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `PetitParser.zip` - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Pillar.zip` - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `PolyMath.zip` - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Roassal2.zip` - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- `Seaside.zip` - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
- **vm/** -
- **70-x64/Pharo** - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the `images/` folder. The user can run the vm on macOS and select any of the Pharo image.
- **Scripts/** - It contains the sample Smalltalk scripts to extract class comments from various projects.
- `ClassCommentExtraction.st` - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.
- `SampleSelectionScript.st` - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.
- **Python_projects/**
- `django.zip` - Django project downloaded from the GitHub. More detail about the project is available on GitHub [Django](https://github.com/django)
- `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on [IPython](https://github.com/ipython/ipython)
- `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on [Mailpile](https://github.com/mailpile/Mailpile)
- `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on [pandas](https://github.com/pandas-dev/pandas)
- `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on [Pipenv](https://github.com/pypa/pipenv)
- `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on [PyTorch](https://github.com/pytorch/pytorch)
- `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on [Requests](https://github.com/psf/requests/)