Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The idea came to my mind to scrap this data. I was working on an e-commerce project Fashion Product Recommendation (an end-to-end project). In this project, upload any fashion image and it will show the 10 closest recommendations.
https://user-images.githubusercontent.com/40932902/169657090-20d3342d-d472-48e3-bc34-8a9686b09961.png" alt="">
https://user-images.githubusercontent.com/40932902/169657035-870bb803-f985-482a-ac16-789d0fcf2a2b.png" alt="">
https://user-images.githubusercontent.com/40932902/169013855-099838d6-8612-45ce-8961-28ccf44f81f7.png" alt="">
I completed my project on this image dataset . The problem I faced while deploying on the Heroku server. Due to the large project file size, I was unable to deploy as Heroku offers limited memory space for a free account.
As currently, I am only familiar with Heroku. Learning AWS for big projects. So, I decided to scrap my own image dataset with much more information that can help me to transform this project to the next level. Scraped this data from flipkart.com(e-commerce website) in two formats Image and textual data in tabular format.
This dataset contains 65k images (400x450 pixel)) of fashion/style products and accessories like clothing, footwear, accessories, and many more. There is a CSV file also mapped with the image name and the id column in tabular data. The name of the image is in a unique numerical format like 1.png, 62299.png Image name and Id columns are the same. So, suppose you want to find the details of any image then you can find them using the image name id, go to the Id column in the csv file and that id rows will be the details of the image. You can find the notebook in the code section which I used to scrap this data.
Columns of CSV Dataset: 1. id : Unique id same as the image name 2. brand: Brand name of the product 3. title: Title of the product 4. sold_price: selling price of the product 5. actual_price: Actual price of the product 6. url : unique URL of every product 7. img: Image URL
How did helped me this dataset: 1. I trained my CNN model using the image data, that's the only use of the image dataset. 2. In my front-end page of the project to display results, I used Image URL and displayed after extracting from the web. This helped me to not upload the image dataset with the project on the server and this saved huge memory space. 3. Using the url displaying live price and** ratings** from the Flipkart website. 4. And there is a Buy button mapped with the url you will be redirected to the original product page and buy it from there. after using this dataset I changed my project name from Fashion Product Recommender to Flipkart Fashion Product Recommender. ๐๐๐
Still, the memory problem was not resolved as the model trained file was above 500MB on the complete dataset. So I tried on multiple sets and finally, I deployed after training on 1000 images only. In the future, I will try on another platform to deploy the complete project. I learned many new things while working on this dataset.
To download the same dataset in small size less than 500mb you can find it here, everything is the same as this dataset only I reduced the pixel of the image from 400x450px to ** 65x80pixels**.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Emmanuel Arias
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides information about top-rated TV shows, collected from The Movie Database (TMDb) API. It can be used for data analysis, recommendation systems, and insights on popular television content.
Key Stats:
Total Pages: 109 Total Results: 2098 TV shows Data Source: TMDb API Sorting Criteria: Highest-rated by vote_average (average rating) with a minimum vote count of 200 Data Fields (Columns):
id: Unique identifier for the TV show name: Title of the TV show vote_average: Average rating given by users vote_count: Total number of votes received first_air_date: The date when the show was first aired original_language: Language in which the show was originally produced genre_ids: Genre IDs linked to the show's genres overview: A brief summary of the show popularity: Popularity score based on audience engagement poster_path: URL path for the show's poster image Accessing the Dataset via API (Python Example):
python Copy code import requests
api_key = 'YOUR_API_KEY_HERE' url = "https://api.themoviedb.org/3/discover/tv" params = { 'api_key': api_key, 'include_adult': 'false', 'language': 'en-US', 'page': 1, 'sort_by': 'vote_average.desc', 'vote_count.gte': 200 }
response = requests.get(url, params=params) data = response.json()
print(data['results'][0]) Dataset Use Cases:
Data Analysis: Explore trends in highly-rated TV shows. Recommendation Systems: Build personalized TV show suggestions. Visualization: Create charts to showcase ratings or genre distribution. Machine Learning: Predict show popularity using historical data. Exporting and Sharing the Dataset (Google Colab Example):
python Copy code import pandas as pd
df = pd.DataFrame(data['results'])
from google.colab import drive drive.mount('/content/drive') df.to_csv('/content/drive/MyDrive/top_rated_tv_shows.csv', index=False) Ways to Share the Dataset:
Google Drive: Upload and share a public link. Kaggle: Create a public dataset for collaboration. GitHub: Host the CSV file in a repository for easy sharing.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.
There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year
Some missing values are present there also,
You can analyse the as per your requirement
Facebook
TwitterAssignment Topic: In this assignment, you will download the datasets provided, load them into a database, write and execute SQL queries to answer the problems provided, and upload a screenshot showing the correct SQL query and result for review by your peers. A Jupyter notebook is provided in the preceding lesson to help you with the process.
This assignment involves 3 datasets for the city of Chicago obtained from the Chicago Data Portal:
This dataset contains a selection of six socioeconomic indicators of public health significance and a hardship index, by Chicago community area, for the years 2008 โ 2012.
This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year.
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.
Instructions:
Before you begin, you will need to become familiar with the datasets. Snapshots for the three datasets in .CSV format can be downloaded from the following links:
Chicago Socioeconomic Indicators: Click here
Chicago Public Schools: Click here
Chicago Crime Data: Click here
NOTE: Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment. The CSV file provided above for the Chicago Crime Data is a very small subset of the full dataset available from the Chicago Data Portal. The original dataset is over 1.55GB in size and contains over 6.5 million rows. For the purposes of this assignment you will use a much smaller sample with only about 500 rows.
Perform this step using the LOAD tool in the Db2 console. You will need to create 3 tables in the database, one for each dataset, named as follows, and then load the respective .CSV file into the table:
CENSUS_DATA
CHICAGO_PUBLIC_SCHOOLS
CHICAGO_CRIME_DATA
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
๐ฆ Ecommerce Dataset (Products & Sizes Included)
๐๏ธ Essential Data for Building an Ecommerce Website & Analyzing Online Shopping Trends ๐ Overview This dataset contains 1,000+ ecommerce products, including detailed information on pricing, ratings, product specifications, seller details, and more. It is designed to help data scientists, developers, and analysts build product recommendation systems, price prediction models, and sentiment analysis tools.
๐น Dataset Features
Column Name Description product_id Unique identifier for the product title Product name/title product_description Detailed product description rating Average customer rating (0-5) ratings_count Number of ratings received initial_price Original product price discount Discount percentage (%) final_price Discounted price currency Currency of the price (e.g., USD, INR) images URL(s) of product images delivery_options Available delivery methods (e.g., standard, express) product_details Additional product attributes breadcrumbs Category path (e.g., Electronics > Smartphones) product_specifications Technical specifications of the product amount_of_stars Distribution of star ratings (1-5 stars) what_customers_said Customer reviews (sentiments) seller_name Name of the product seller sizes Available sizes (for clothing, shoes, etc.) videos Product video links (if available) seller_information Seller details, such as location and rating variations Different variants of the product (e.g., color, size) best_offer Best available deal for the product more_offers Other available deals/offers category Product category
๐ Potential Use Cases
๐ Build an Ecommerce Website: Use this dataset to design a functional online store with product listings, filtering, and sorting. ๐ Price Prediction Models: Predict product prices based on features like ratings, category, and discount. ๐ฏ Recommendation Systems: Suggest products based on user preferences, rating trends, and customer feedback. ๐ฃ Sentiment Analysis: Analyze what_customers_said to understand customer satisfaction and product popularity. ๐ Market & Competitor Analysis: Track pricing trends, popular categories, and seller performance. ๐ Why Use This Dataset? โ Rich Feature Set: Includes all necessary ecommerce attributes. โ Realistic Pricing & Rating Data: Useful for price analysis and recommendations. โ Multi-Purpose: Suitable for machine learning, web development, and data visualization. โ Structured Format: Easy-to-use CSV format for quick integration.
๐ Dataset Format
CSV file (ecommerce_dataset.csv)
1000+ samples
Multi-category coverage
๐ How to Use?
Download the dataset from Kaggle.
Load it in Python using Pandas:
python
Copy
Edit
import pandas as pd
df = pd.read_csv("ecommerce_dataset.csv")
df.head()
Explore trends & patterns using visualization tools (Seaborn, Matplotlib).
Build models & applications based on the dataset!
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The test bench used to acquire the dataset was composed of two similar triphasic squirrel cage induction machines, two frequency converters, a failure emulation control panel, and a resistor load bank. One of the induction machines was properly prepared to enable the emulation of stator winding inter-turns to short-circuit. Its stator circuit was re-winded, making it possible to access the ramifications of the winding, in order to insert inter-turn short circuits. Different levels of short-circuit can be emulated, from very incipient defects to severe situations. It operates as a motor and the other machine emulates the mechanical load of the motor. The frequency converters are used to drive the induction machines. This way, the machines can work at different driving frequencies. The induction machines used have the following specifications: 4 poles, 1 HP of mechanical power, delta configuration, 220V of supply voltage, and 3A of rated current. The frequency converters are both WEG CFW-08 (WEG, 2019). Two types of faults were simulated:
x1 is for first channel, x2 for second channel, x3 for third channel and x4 is for forth channel. Every 100000 data is for a data sample in the whole dataset.
Preprocessed from: This link
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
I am preparing a book on change to add to my publications (https://robertolofaro.com/published), and I was looking into speeches delivered by ECB, and the search on the website wasn't what I needed.
Started posting online updates in late 2019, currently the online webapp that allows to search via a tag cloud is updated on a weekly basis, each Monday evening.
Search by tag: https://robertolofaro.com/ECBSpeech (links also to dataset on kaggle)
From 2024-03-25, the dataset contains also the AI-based audio transcripts of any ECB item collected, whenever the audio file is accessible
source: ECB website
In late October/early November 2019, ECB posted on Linkedin a link to a CSV dataset extending from 1997 up to 2019-10-25 with all the speeches delivered, as per their website
The dataset was "flat"- and I needed to both search quickly for associations of people to concepts, and to see directly the relevant speech in a human-readable format (as some speeches had pictures, tables, attachments, etc)
So, I recycled a concept that I had developed for other purposes and used in an experimental "search by tag cloud on structured content" on https://robertolofaro.com/BFM2013tag
The result is https://robertolofaro.com/ECBSpeech, that contains information from the CSV file (see website for the link to the source), with the additional information as shown within the "About this file".
The concept behind this sharing of the dataset on Kaggle, and releasing on my public website the application I use to navigate date (I have a local Xampp where I use this and other applications to support the research side of my past business and current publication activities) is shared on http://robertolofaro.com/datademocracy
This tag cloud contains the most common words 1997-2020 across the dataset
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3925987%2Fcf58205d2447ed7355c1a4e213f5b477%2F20200902_kagglerelease.png?generation=1599033600865103&alt=media" alt="">
Thanks to the ECB for saving my time (I was going to copy-and-paste or "scrape" with R from the speeches posted on their website) by releasing the dataset https://www.ecb.europa.eu/press/key/html/downloads.en.html
In my cultural and organizational change activities and within data collection, collation, and processing to support management decision-making (including my own) since the 1980s, I always saw that the more data we collect, the less time to retrieve it when needed there is.
I usually worked across multiple environments, industries, cultures, and "collecting" was never good enough if I could not then "retrieve by association".
In storytelling is fine just to roughly remember "cameos from the past", but in data storytelling (or when trying to implement a new organization, process, or even just software or data analysis) being able to pinpoint a source that might have been there before is equally important.
So, I am simply exploring different ways to cross-reference information from different domains, as I am quite confident that within all the open data (including the ECB speeches) there are the results of what niche experts saw on various items.
Therefore, why should time and resources be wasted on redoing what was done from others, when you can start from their endpoint, before adapting first, and adopting then (if relevant)?
2020-01-25: added GITHUB repository for versioning and release of additional material as the upload of the new export_datamart.csv wasn't possible, it is now available at: https://github.com/robertolofaro/ecbspeech
changes in the dataset: 1. fixed language codes 2. added speeches published on the ECB website in January 2020 (up to 2020-01-25 09:00 CET) 3. added all the items listed under the "interview" section of the ECB website
current content: 340 interviews, 2374 speeches
2020-01-29: the same file on GITHUB released on 2020-01-25, containing both speeches and interviews, and within an additional column to differentiate between the two, is now available on Kaggle
current content: 340 interviews, 2374 speeches
2020-02-26: monthly update, with items released on the ECB website up to 2020-02-22
current content: 2731 items, 345 interviews, 2386 speeches
2020-03-25: monthly update, with items released on the ECB website up to 2020-03-20
since March 2020, the dataset includes also press conferences available on he ECB website
current content: 2988 records (2392 speeches, 351 interviews, 245 press conferences)
2020-06-07: update, with items released on the ECB website up to 2020-06-07
since June 2020, the dataset includes also press conferences, blog posts, and podcasts available on the ECB website
current content: 3030 records (2399 speeches, 369 interviews, 247 press conferences, 8 blog posts, 7 ECB Podcast). ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By SocialGrep [source]
A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment
For more datasets, click here.
- ๐จ Your notebook can be here! ๐จ!
In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.
Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.
In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.
You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes
- Finding correlations between different types of datasets
- Determining which datasets are most popular on Reddit
- Analyzing the sentiments of post and comments on Reddit's /r/datasets board
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |
File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customerโs attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.
The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.
This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.
The following is the Google Colab link to the project, done on Jupyter Notebook -
https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN
The following is the GitHub Repository of the project -
https://github.com/daerkns/social-media-and-mental-health
Libraries used for the Project -
Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
I produced the dataset whilst working on the 2023 Kaggle AI report. The Meta Kaggle dataset provides helpful information about the Kaggle competitions but not the original descriptive text from the Kaggle web pages for each competition. We have information about the solutions but not the original problem. So, I wrote some web scraping scripts to collect and store that information.
Not all Kaggle web pages have that information available; some are missing or broken. Hence the nulls in the data. Secondly, note that not all previous Kaggle competitions exist in the Meta Kaggle data, which was used to collect the webpage slugs.
The scrapping scripts iterate over the IDs in Meta Kaggle competitions.csv data and attempt to collect the webpage data for that competition if it is currently null in the database. Hence new IDs will cause the scripts to go and collect their data, and each week, the scripts will try and fill in any links that were not working previously.
I have recently converted the original local scraping scripts on my machine into a Kaggle notebook that now updates this dataset weekly on Mondays. The notebook also explains the scraping procedure and its automation to keep this dataset up-to-date.
Note that the CompetitionId field joins to the Id of the competitions.csv of the Meta Kaggle dataset so that this information can be combined with the rest of Meta Kaggle.
My primary reason for collecting the data was for some text classification work I wanted to do, and I will publish it here soon. I hope that the data is useful to some other projects as well :-)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The data for this project was meticulously gathered from Goodreads, focusing on the curated list of books that are deemed essential reading. The data collection process was carried out in two distinct phases to ensure comprehensive and accurate capture of all relevant information.
The data collection effort resulted in the comprehensive gathering of details for 6,313 books. This dataset includes essential information such as book titles, URLs, detailed descriptions, and genres. The structured approach, involving separate scripts for URL extraction and detailed data scraping, ensures that the dataset is both thorough and well-organized. The final dataset, encapsulated in book_details.csv, provides a robust foundation for further exploration, analysis, and insights into the literary works recommended on Goodreads.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">
A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81โ102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://badges.frapsoft.com/os/v2/open-source.svg?v=103" alt="Open Source Love">
https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat" alt="">
https://licensebuttons.net/l/by/4.0/88x31.png" alt="">
Mahshad Lotfinia, and Soroosh Tayebi Arasteh. "Machine Learning-Based Generalized Model for Finite Element Analysis of Roll Deflection During the Austenitic Stainless Steel 316L Strip Rolling". arXiv:2102.02470, February 2021.
@misc{Stress316L,
title={Machine Learning-Based Generalized Model for Finite Element Analysis of Roll Deflection During the Austenitic Stainless Steel 316L Strip Rolling},
author={Mahshad Lotfinia and Soroosh Tayebi Arasteh},
year={2021},
eprint={2102.02470},
archivePrefix={arXiv},
primaryClass={cs.LG}
Unlike the other groups of metals, Austenitic Stainless Steel 316L has an unpredictable Strain-Stress curve. Thus, we conducted a series of mechanical tensile tests at different strain rates. Afterwards, using this dataset we can train a neural network to predict the best Strain-Stress curve that predicts more accurate values of the flow stress during the cold deformation.
We conducted four sets of Uniaxial Tensile Tests in 0.001Sโ1, 0.00052Sโ1, 0.0052Sโ1, and 0.052Sโ1 strain rates in the room temperature on our Austenitic Stainless Steel 316L sample. According to the ASTME8 standard, the ASS316L sheets with an initial thickness of 4 mm, width of 6 mm, and Gage length of 32 mm were utilized for the tensile tests using a compression test machine (Electro Mechanic Instron 4208). The results were transferred to the Santam Machine Controller software for recording, which led to obtaining the extension data (in mm) and the force data (in N), which were converted to the true-strain and true-stress values. The data conversion procedure was done by considering the cross-section of the loaded force, which for our case was 24 mm^2.
15,858 different Strain-Stress values at 4 different strain rates.
All the files are provided in the "csv" format.
The dataset URL:
https://kaggle.com/mahshadlotfinia/Stress316L/
The accompanying dataset is released under a Creative Commons Attribution 4.0 International License.
The official source code of the paper: https://github.com/mahshadlotfinia/Stress316L/
E-mail: mahshad.lotfinia@alum.sharif.edu
Materials Science and Engineering Mechanical Lab, the Sharif University of Technology, Tehran, Iran.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data is taken as a link between the competition dataset csv: https://www.kaggle.com/competitions/rsna-breast-cancer-detection
and the 256x256 images of that data set created here: https://www.kaggle.com/datasets/theoviel/rsna-breast-cancer-256-pngs
This should allow the data to be read in as a directory from TensorFlow allowing the labels to be attached to the images themselves rather than in a separate csv file.
Facebook
TwitterThe CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. The classes are completely mutually exclusive. There are 50000 training images and 10000 test images.
The batches.meta file contains the label names of each class.
The dataset was originally divided in 5 training batches with 10000 images per batch. The original dataset can be found here: https://www.cs.toronto.edu/~kriz/cifar.html. This dataset contains all the training data and test data in the same CSV file so it is easier to load.
Here is the list of the 10 classes in the CIFAR-10:
Classes: 1) 0: airplane 2) 1: automobile 3) 2: bird 4) 3: cat 5) 4: deer 6) 5: dog 7) 6: frog 8) 7: horse 9) 8: ship 10) 9: truck
The function used to open the file:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Example of how to read the file:
metadata_path = './cifar-10-python/batches.meta' # change this path
metadata = unpickle(metadata_path)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
microsoft/deberta-v3-base model for sentiment regression.training.1600000.processed.noemoticon.csvtarget: Sentiment polarity (converted to float)ids: Tweet IDsdate: Date of the tweetflag: Query flaguser: User handletext: Tweet textISO-8859-1 encoding.target column to float.microsoft/deberta-v3-baseAutoTokenizer from Hugging Face with max_length=160 and padding='max_length'.fp16=True)Trainer class for model training and evaluation.@misc{he2021debertav3,
title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing},
author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
year={2021},
eprint={2111.09543},
archivePrefix={arXiv},
primaryClass={cs.CL}}
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}}
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.