3 datasets found

Daily Machine Learning Practice
kaggle.com
zip
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Astrid Villalobos (2025). Daily Machine Learning Practice [Dataset]. https://www.kaggle.com/datasets/astridvillalobos/daily-machine-learning-practice
Explore at:
zip(1019861 bytes)Available download formats
Dataset updated
Nov 9, 2025
Authors
Astrid Villalobos
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Daily Machine Learning Practice – 1 Commit per Day

Author: Astrid Villalobos Location: Montréal, QC LinkedIn: https://www.linkedin.com/in/astridcvr/

Objective The goal of this project is to strengthen Machine Learning and data analysis skills through small, consistent daily contributions. Each commit focuses on a specific aspect of data processing, feature engineering, or modeling using Python, Pandas, and Scikit-learn.

Dataset Source: Kaggle – Sample Sales Data File: data/sales_data_sample.csv Variables: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, SALES, COUNTRY, etc. Goal: Analyze e-commerce performance, predict sales trends, segment customers, and forecast demand.

**Project Rules **Rule Description 🟩 1 Commit per Day Minimum one line of code daily to ensure consistency and discipline 🌍 Bilingual Comments Code and documentation in English and French 📈 Visible Progress Daily green squares = daily learning 🧰 Tech Stack

Languages: Python Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn Tools: Jupyter Notebook, GitHub, Kaggle

Learning Outcomes By the end of this challenge: Develop a stronger understanding of data preprocessing, modeling, and evaluation. Build consistent coding habits through daily practice. Apply ML techniques to real-world sales data scenarios.
u
Data from: Using social-ecological models to explore stream connectivity...
verso.uidaho.edu
data.nkn.uidaho.edu
Updated Aug 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Jossie; Travis Seaborn; Colden Baxter; Morey Burnham (2023). Data from: Using social-ecological models to explore stream connectivity outcomes for stakeholders and Yellowstone cutthroat trout [Dataset]. https://verso.uidaho.edu/esploro/outputs/dataset/Data-from-Using-social-ecological-models-to/996765629701851
Explore at:
Dataset updated
Aug 30, 2023
Dataset provided by
University of Idaho, Idaho State University, Idaho EPSCoR, EPSCoR GEM3
Authors
Elizabeth Jossie; Travis Seaborn; Colden Baxter; Morey Burnham
Time period covered
Aug 30, 2023
Area covered

Description
Data from the 2023 Ecological Applications manuscript: Using social-ecological models to explore stream connectivity outcomes for stakeholders and Yellowstone cutthroat trout.

Input files and R scripts for running YCT connectivity simulations in CDMetaPOP. Full-resolution mental models constructed by Teton Valley stakeholders. Data are accessible from the Zenodo, and are the v1.0.0 release of the Connectivity_YCT_2022 GitHub repository

Data Use
License
Open
Recommended Citation
Jossie L, Seaborn T, Baxter CV, Burnham M. 2023. lizziejossie/Connectivity_YCT_2022: YCT_Connectivity_EcologicalApplications (v1.0.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.8161826
Funding
US National Science Foundation and Idaho EPSCoR: OIA-1757324
D
Data from: Data related to Panzer: A Machine Learning Based Approach to...
darus.uni-stuttgart.de
Updated Nov 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4576
Dataset updated
Nov 27, 2024
Dataset provided by
DaRUS
Authors
Tim Panzer
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576
Time period covered
Nov 1, 1976 - Feb 29, 2024
Dataset funded by
DFG
Description
This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Astrid Villalobos (2025). Daily Machine Learning Practice [Dataset]. https://www.kaggle.com/datasets/astridvillalobos/daily-machine-learning-practice

Daily Machine Learning Practice

Step-by-step machine learning practice using real-world sales data

Explore at:

zip(1019861 bytes)Available download formats

Dataset updated

Nov 9, 2025

Authors

Astrid Villalobos

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Daily Machine Learning Practice – 1 Commit per Day

Author: Astrid Villalobos Location: Montréal, QC LinkedIn: https://www.linkedin.com/in/astridcvr/

Objective The goal of this project is to strengthen Machine Learning and data analysis skills through small, consistent daily contributions. Each commit focuses on a specific aspect of data processing, feature engineering, or modeling using Python, Pandas, and Scikit-learn.

Dataset Source: Kaggle – Sample Sales Data File: data/sales_data_sample.csv Variables: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, SALES, COUNTRY, etc. Goal: Analyze e-commerce performance, predict sales trends, segment customers, and forecast demand.

**Project Rules **Rule Description 🟩 1 Commit per Day Minimum one line of code daily to ensure consistency and discipline 🌍 Bilingual Comments Code and documentation in English and French 📈 Visible Progress Daily green squares = daily learning 🧰 Tech Stack

Languages: Python Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn Tools: Jupyter Notebook, GitHub, Kaggle

Learning Outcomes By the end of this challenge: Develop a stronger understanding of data preprocessing, modeling, and evaluation. Build consistent coding habits through daily practice. Apply ML techniques to real-world sales data scenarios.

Clear search

Close search

Google apps

Main menu

Daily Machine Learning Practice

Data from: Using social-ecological models to explore stream connectivity...

Data from: Data related to Panzer: A Machine Learning Based Approach to...

Daily Machine Learning Practice

Step-by-step machine learning practice using real-world sales data