This dataset was created by chewytteok
Dataset for my (German) Python Data Science Tutorial on YouTube.
Playlist: https://www.youtube.com/playlist?list=PLW4WJMmOF9juA1Ebs1vNwTBuF7ck6YCT7
My version of: 'Bike Share Daily Data' (https://www.kaggle.com/contactprad/bike-share-daily-data)
Data used in this competition: https://www.kaggle.com/c/bike-sharing-demand
Use of this dataset in publications must be cited to the following publication:
[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.
@article{ year={2013}, issn={2192-6352}, journal={Progress in Artificial Intelligence}, doi={10.1007/s13748-013-0040-3}, title={Event labeling combining ensemble detectors and background knowledge}, url={http://dx.doi.org/10.1007/s13748-013-0040-3}, publisher={Springer Berlin Heidelberg}, keywords={Event labeling; Event detection; Ensemble learning; Background knowledge}, author={Fanaee-T, Hadi and Gama, Joao}, pages={1-15} }
This dataset was created by Rahul Nakka
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Şükrü Yusuf Kaya
Released under MIT
This dataset was created by Chao CHEN
This dataset was created by Joe Fitzgerald
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Kristo Radion Purba
Released under Apache 2.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Daniil Barysevich
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is comprised of several others and is collected specially for the Vowpal Wabbit tutorial, Kernel. The tutorial covers (both theoretically and in practice) two reasons of Vowpal Wabbit's exceptional training speed, namely, online learning and hashing trick. We'll try it out with the Spooky Author Identification dataset as well as with news, letters, movie reviews datasets and gigabytes of StackOverflow questions.
The included datasets are:
This dataset was created by Abhinand
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by pistachio_overlord
Released under CC0: Public Domain
This dataset was created by Chao CHEN
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by annie2509
Released under MIT
This is the dataset that goes along with the Deep Learning basics with Python, TensorFlow and Keras p.2 Tutorial provided by Sentdex. Link here: https://www.youtube.com/watch?v=j-3vuBynnOE&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=2
This dataset was created by Pritam Purohit
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Guangyu Song
Released under CC0: Public Domain
Data for the pytorch example: https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)
분류(3회 기출 심화 변형) : https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
작업형1 (3회 기출 유형)
작업형1 모의문제2(심화) https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2
Tasks 탭에서 문제 및 코드 확인
[2회차 기출 유형] 작업형1 P: https://www.kaggle.com/agileteam/tutorial-t1-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-2-r-2
공식 예시문제(작업형1) P: https://www.kaggle.com/agileteam/tutorial-t1-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-r
T1-1.Outlier(IQR) / #이상치 #IQR P: https://www.kaggle.com/agileteam/py-t1-1-iqr-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-1-iqr-expected-questions-2
T1-2.Outlier(age) / #이상치 #소수점나이 P: https://www.kaggle.com/agileteam/py-t1-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-2-expected-questions-2
T1-3. Missing data / #결측치 #삭제 #중앙 #평균 P: https://www.kaggle.com/agileteam/py-t1-3-map-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-3-expected-questions-2
T1-4. Skewness and Kurtosis (Log Scale) / #왜도 #첨도 #로그스케일 P: https://www.kaggle.com/agileteam/py-t1-4-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-4-expected-questions-2
T1-5. Standard deviation / #표준편차 P: https://www.kaggle.com/agileteam/py-t1-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-5-expected-questions-2
T1-6. Groupby Sum / #결측치 #조건 P: https://www.kaggle.com/agileteam/py-t1-6-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-6-expected-questions-2
T1-7. Replace / #값변경 #조건 #최대값 P: https://www.kaggle.com/agileteam/py-t1-7-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-7-2-expected-questions-2
T1-8. Cumulative Sum / #누적합 #결측치 #보간 P: https://www.kaggle.com/agileteam/py-t1-8-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-8-expected-questions-2
T1-9. Standardization / #표준화 #중앙값 P: https://www.kaggle.com/agileteam/py-t1-9-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-9-expected-questions-2
T1-10. Yeo-Johnson and Box–Cox / #여존슨 #박스-콕스 #결측치 #최빈값 P: https://www.kaggle.com/agileteam/py-t1-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-10-expected-questions-2
T1-11. min-max scaling / #스케일링 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-11-min-max-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-11-min-max-5-expected-questions-2
T1-12. top10-bottom10 / #그룹핑 #정렬 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-12-10-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-12-10-expected-questions-2
T1-13. Correlation / #상관관계 P: https://www.kaggle.com/agileteam/py-t1-13-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-13-expected-questions-2
T1-14. Multi Index & Groupby / #멀티인덱스 #정렬 #인덱스리셋 #상위값 P: https://www.kaggle.com/agileteam/py-t1-14-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-14-2-expected-question-2
T1-15. Slicing & Condition / #슬라이싱 #결측치 #중앙값 #조건 P: https://www.kaggle.com/agileteam/py-t1-15-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-15-expected-question-2
T1-16. Variance / #분산 #결측치전후값차이 P: https://www.kaggle.com/agileteam/py-t1-16-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-16-expected-question-2
T1-17. Time-Series1 / #시계열데이터 #datetime P: https://www.kaggle.com/agileteam/py-t1-17-1-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-17-1-expected-question-2
T1-18. Time-Series2 / #주말 #평일 #비교 #시계열데이터 P: https://www.kaggle.com/agileteam/py-t1-18-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-18-2-expected-question-2
T1-19. Time-Series3 (monthly total) / #월별 #총계 #비교 #데이터값변경
P: https://www.kaggle.com/agileteam/py-t1-19-3-expected-question
R: https://www.kaggle.com/limmyoungjin/r-t1-19-3-expected-question-2
T1-20. Combining Data / 데이터 #병합 #결합 / 고객과 궁합이 맞는 타입 매칭
P: https://www.kaggle.com/agileteam/py-t1-20-expected-question
R: https://www.kaggle.com/limmyoungjin/r-t1-20-expected-question-2
T1-21. Binning Data / #비닝 #구간나누기 P: https://www.kaggle.com/agileteam/py-t1-21-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-21-expected-question-2
T1-22. Time-Series4 (Weekly data) / #주간 #합계 P: https://www.kaggle.com/agileteam/t1-22-time-series4-weekly-data R: https://www.kaggle.com/limmyoungjin/r-t1-22-time-series4-weekly-data-2
T1-23. Drop Duplicates / #중복제거 #결측치 #10번째값으로채움 P: https://www.kaggle.com/agileteam/t1-23-drop-duplicates R: https://www.kaggle.com/limmyoungjin/r-t1-23-drop-duplicates-2
T1-24. Time-Series5 (Lagged Feature) / #시차데이터 #조건 P: https://www.kaggle.com/agileteam/t1-24-time-series5-lagged-feature R: https://www.kaggle.com/limmyoungjin/r-t1-24-time-series5-2
[MOCK EXAM1] TYPE1 / 작업형1 모의고사 P: https://www.kaggle.com/agileteam/mock-exam1-type1-1-tutorial R: https://www.kaggle.com/limmyoungjin/mock-exam1-type1-1
[MOCK EXAM2] TYPE1 / 작업형1 모의고사2 P: https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2
Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
[2회차 기출유형 작업형2] : E-Commerce Shipping Data P: https://www.kaggle.com/agileteam/tutorial-t2-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t2-2-r
T2. Exercise / 예시문제 : 백화점고객의 1년간 데이터 (dataq 공식 예제) P: https://www.kaggle.com/agileteam/t2-exercise-tutorial-baseline
T2-1. Titanic (Classification) / 타이타닉 P: https://www.kaggle.com/agileteam/t2-1-titanic-simple-baseline R: https://www.kaggle.com/limmyoungjin/r-t2-1-titanic
T2-2. Pima Indians Diabetes (Classification) / 당뇨병 P: https://www.kaggle.com/agileteam/t2-2-pima-indians-diabetes R: https://www.kaggle.com/limmyoungjin/r-t2-2-pima-indians-diabetes
T2-3. Adult Census Income (Classification) / 성인 인구소득 예측 P: https://www.kaggle.com/agileteam/t2-3-adult-census-income-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-3-adult-census-income
T2-4. House Prices (Regression) / 집값 예측 / RMSE P: https://www.kaggle.com/code/blighpark/t2-4-house-prices-regression R: https://www.kaggle.com/limmyoungjin/r-t2-4-house-prices
T2-5. Insurance Forecast (Regression) / P: https://www.kaggle.com/agileteam/insurance-starter-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-5-insurance-prediction
T2-6. Bike-sharing-demand (Regression) / 자전거 수요 예측 / RMSLE P: R: https://www.kaggle.com/limmyoungjin/r-t2-6-bike-sharing-demand
[MOCK EXAM1] TYPE2. HR-DATA / 작업형2 모의고사 P: https://www.kaggle.com/agileteam/mock-exam-t2-exam-template(템플릿만 제공) https://www.kaggle.com/agileteam/mock-exam-t2-starter-tutorial (찐입문자용) https://www.kaggle.com/agileteam/mock-exam-t2-baseline-tutorial (베이스라인)
주차 | 유형(에디터) | 번호 |
---|---|---|
6주 전 | 작업형1(노트북) | T1-1~5 |
5주 전 | 작업형1(노트북) | T1-6~9, T1 EQ(기출), |
4주 전 | 작업형1(스크립트), 작업형2(노트북) | T1-10~13, T1.Ex, T2EQ, T2-1 |
3주 전 | 작업형1(스크립트), 작업형2(노트북) | T1-14~19, T2-2~3 |
2주 전 | 작업형1(스크립트), 작업형2(스크립트) | T1-20~21, T2-4~6, 복습 |
1주 전 | 작업형1, 작업형2(스크립트), 단답형 | T1-22~24, 모의고사, 복습, 응시환경 체험, 단답 |
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial
The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data
I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing
Once the csv file is uploaded to Google Colab, use these commands to process the file.
import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The competition is over 2 yrs ago. I just wanna play around the dataset.
The labeled data set consists of 50,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary, meaning the IMDB rating < 5 results in a sentiment score of 0, and rating >=7 have a sentiment score of 1. No individual movie has more than 30 reviews. The 25,000 review labeled training set does not include any of the same movies as the 25,000 review test set. In addition, there are another 50,000 IMDB reviews provided without any rating labels.
The origin place is here. Awesome tutorial is here, we can play with it.
Just for study and learning
This dataset was created by chewytteok