13 datasets found
  1. Top Rated TV Shows

    • kaggle.com
    zip
    Updated Jan 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreya Gupta (2025). Top Rated TV Shows [Dataset]. https://www.kaggle.com/datasets/shreyajii/top-rated-tv-shows
    Explore at:
    zip(314571 bytes)Available download formats
    Dataset updated
    Jan 5, 2025
    Authors
    Shreya Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides information about top-rated TV shows, collected from The Movie Database (TMDb) API. It can be used for data analysis, recommendation systems, and insights on popular television content.

    Key Stats:

    Total Pages: 109 Total Results: 2098 TV shows Data Source: TMDb API Sorting Criteria: Highest-rated by vote_average (average rating) with a minimum vote count of 200 Data Fields (Columns):

    id: Unique identifier for the TV show name: Title of the TV show vote_average: Average rating given by users vote_count: Total number of votes received first_air_date: The date when the show was first aired original_language: Language in which the show was originally produced genre_ids: Genre IDs linked to the show's genres overview: A brief summary of the show popularity: Popularity score based on audience engagement poster_path: URL path for the show's poster image Accessing the Dataset via API (Python Example):

    python Copy code import requests

    api_key = 'YOUR_API_KEY_HERE' url = "https://api.themoviedb.org/3/discover/tv" params = { 'api_key': api_key, 'include_adult': 'false', 'language': 'en-US', 'page': 1, 'sort_by': 'vote_average.desc', 'vote_count.gte': 200 }

    response = requests.get(url, params=params) data = response.json()

    Display the first show

    print(data['results'][0]) Dataset Use Cases:

    Data Analysis: Explore trends in highly-rated TV shows. Recommendation Systems: Build personalized TV show suggestions. Visualization: Create charts to showcase ratings or genre distribution. Machine Learning: Predict show popularity using historical data. Exporting and Sharing the Dataset (Google Colab Example):

    python Copy code import pandas as pd

    Convert the API data to a DataFrame

    df = pd.DataFrame(data['results'])

    Save to CSV and upload to Google Drive

    from google.colab import drive drive.mount('/content/drive') df.to_csv('/content/drive/MyDrive/top_rated_tv_shows.csv', index=False) Ways to Share the Dataset:

    Google Drive: Upload and share a public link. Kaggle: Create a public dataset for collaboration. GitHub: Host the CSV file in a repository for easy sharing.

  2. h

    auto-pale

    • huggingface.co
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zeionara (2023). auto-pale [Dataset]. https://huggingface.co/datasets/zeio/auto-pale
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Authors
    zeionara
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset card for pale

      Dataset summary
    

    This dataset contains league of legends champions' quotes parsed from fandom. See dataset usage example at google colab. The dataset is available in the following configurations:

    vanilla - all data pulled from the website without significant modifications apart from the web page structure parsing; quotes - truncated version of the corpus, which does't contain sound effects; annotated - an extended version of the full configuration… See the full description on the dataset page: https://huggingface.co/datasets/zeio/auto-pale.

  3. OpenOrca

    • kaggle.com
    • opendatalab.com
    • +1more
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
    Explore at:
    zip(2548102631 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Open-Orca Augmented FLAN Dataset

    Unlocking Advanced Language Understanding and ML Model Performance

    By Huggingface Hub [source]

    About this dataset

    The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

    Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

    Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

    import pandas as pd   # Library used for importing datasets into Python 
    
    df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame 
    
    df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'
    

    After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

     df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column
     Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on   
    

    Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
    Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

    Research Ideas

    • Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.
    • Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.
    • Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...

  4. h

    watches

    • huggingface.co
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gil (2025). watches [Dataset]. https://huggingface.co/datasets/yotam22/watches
    Explore at:
    Dataset updated
    Nov 17, 2025
    Authors
    gil
    Description

    🕰️ Exploratory Data Analysis of Luxury Watch Prices

      Overview
    

    This project analyzes a large dataset of luxury watches to understand which factors influence price.We focus on brand, movement type, case material, size, gender, and production year.All work was done in Python (Pandas, NumPy, Matplotlib/Seaborn) on Google Colab.

      Dataset
    

    Rows: ~172,000
    Columns: 14
    Unit of observation: one watch listing

    Main columns

    name – watch/listing title
    price – listed… See the full description on the dataset page: https://huggingface.co/datasets/yotam22/watches.

  5. Z

    Robot@Home2, a robotic dataset of home environments

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Apr 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ambrosio-Cestero, Gregorio; Ruiz-Sarmiento, JosĂŠ Raul; GonzĂĄlez-JimĂŠnez, Javier (2024). Robot@Home2, a robotic dataset of home environments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3901563
    Explore at:
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    Universitiy of MĂĄlaga
    University of MĂĄlaga
    Authors
    Ambrosio-Cestero, Gregorio; Ruiz-Sarmiento, JosĂŠ Raul; GonzĂĄlez-JimĂŠnez, Javier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Robot-at-Home dataset (Robot@Home, paper here) is a collection of raw and processed data from five domestic settings compiled by a mobile robot equipped with 4 RGB-D cameras and a 2D laser scanner. Its main purpose is to serve as a testbed for semantic mapping algorithms through the categorization of objects and/or rooms.

    This dataset is unique in three aspects:

    The provided data were captured with a rig of 4 RGB-D sensors with an overall field of view of 180°H. and 58°V., and with a 2D laser scanner.

    It comprises diverse and numerous data: sequences of RGB-D images and laser scans from the rooms of five apartments (87,000+ observations were collected), topological information about the connectivity of these rooms, and 3D reconstructions and 2D geometric maps of the visited rooms.

    The provided ground truth is dense, including per-point annotations of the categories of the objects and rooms appearing in the reconstructed scenarios, and per-pixel annotations of each RGB-D image within the recorded sequences

    During the data collection, a total of 36 rooms were completely inspected, so the dataset is rich in contextual information of objects and rooms. This is a valuable feature, missing in most of the state-of-the-art datasets, which can be exploited by, for instance, semantic mapping systems that leverage relationships like pillows are usually on beds or ovens are not in bathrooms.

    Robot@Home2

    Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. It consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.

    If you use Robot@Home2, please cite the following paper:

    Gregorio Ambrosio-Cestero, Jose-Raul Ruiz-Sarmiento, Javier Gonzalez-Jimenez, The Robot@Home2 dataset: A new release with improved usability tools, in SoftwareX, Volume 23, 2023, 101490, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2023.101490.

    @article{ambrosio2023robotathome2,title = {The Robot@Home2 dataset: A new release with improved usability tools},author = {Gregorio Ambrosio-Cestero and Jose-Raul Ruiz-Sarmiento and Javier Gonzalez-Jimenez},journal = {SoftwareX},volume = {23},pages = {101490},year = {2023},issn = {2352-7110},doi = {https://doi.org/10.1016/j.softx.2023.101490},url = {https://www.sciencedirect.com/science/article/pii/S2352711023001863},keywords = {Dataset, Mobile robotics, Relational database, Python, Jupyter, Google Colab}}

    Version historyv1.0.1 Fixed minor bugs.v1.0.2 Fixed some inconsistencies in some directory names. Fixes were necessary to automate the generation of the next version.v2.0.0 SQL based dataset. Robot@Home v1.0.2 has been packed into a sqlite database along with RGB-D and scene files which have been assembled into a hierarchical structured directory free of redundancies. Path tables are also provided to reference files in both v1.0.2 and v2.0.0 directory hierarchies. This version has been automatically generated from version 1.0.2 through the toolbox.v2.0.1 A forgotten foreign key pair have been added.v.2.0.2 The views have been consolidated as tables which allows a considerable improvement in access time.v.2.0.3 The previous version does not include the database. In this version the database has been uploaded.v.2.1.0 Depth images have been updated to 16-bit. Additionally, both the RGB images and the depth images are oriented in the original camera format, i.e. landscape.

  6. Human alterations of the global floodplains 1965-2019

    • catalog.data.gov
    • s.cnmilf.com
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). Human alterations of the global floodplains 1965-2019 [Dataset]. https://catalog.data.gov/dataset/human-alterations-of-the-global-floodplains-1965-2019
    Explore at:
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We developed the first publicly available spatially explicit estimates of the human alterations along the global floodplains during the recent 27 years (1992-2019) at 250-m resolution. To maximize the reuse of our datasets and advance the open science of human floodplain alteration, we developed three web-based programming tools: (1) Floodplain Mapping Tool, (2) Land Use Change Tool, and (3) Human Alteration Tool supported with tutorials and step-by-step audiovisual instructions. Our data reveal a significant loss of natural floodplains worldwide with 460,000 km2 of new agricultural and 140,000 km2 of new developed areas between 1992 and 2019. This dataset offers critical new insights into how floodplains are being destroyed, which will help decision-makers to reinforce strategies to conserve and restore floodplain functions and habitat. This dataset is not publicly accessible because: EPA scientists provided context and commentary but did not do any of the analyses or handle any of the data. It can be accessed through the following means: The entire data record can be downloaded as a single zip file from this web link: http://www.hydroshare.org/resource/cdb5fd97e0644a14b22e58d05299f69b. The global floodplain alteration dataset is derived entirely through ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms. To assist in reuse and application of the dataset, we developed additional Python codes aggregated as three web-based tools: Floodplain Mapping Tool: https://colab.research.google.com/drive/1xQlARZXKPexmDInYV-EMoJ-HZxmFL-eW?usp=sharing. Land Use Change Tool: https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp=sharing. Human Alteration Tool: https://colab.research.google.com/drive/1r2zNJNpd3aWSuDV2Kc792qSEjvDbFtBy?usp=share_link See Usage Notes section in the journal article for details. Format: The global floodplain alteration dataset is available through the HydroShare open geospatial data platform. Our data record also includes all corresponding input data, intermediate calculations, and supporting information. This dataset is associated with the following publication: Rajib, A., Q. Zheng, C. Lane, H. Golden, J. Christensen, I. Isibor, and K. Johnson. Human alterations of the global floodplains 1992–2019. Scientific Data. Springer Nature, New York, NY, USA, 10: 499, (2023).

  7. GEOEYE-70 | Earth Observation

    • kaggle.com
    zip
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeyad Omar (2024). GEOEYE-70 | Earth Observation [Dataset]. https://www.kaggle.com/datasets/zeadomar/geoeye-70-earth-observation
    Explore at:
    zip(1277327461 bytes)Available download formats
    Dataset updated
    Apr 4, 2024
    Authors
    Zeyad Omar
    Area covered
    Earth
    Description

    The GEOEYE-70 dataset is a meticulously curated collection of Earth observation data acquired from various satellites. It offers high-resolution imagery, ideal for capturing detailed features of Earth's surface, including diverse landscapes and cityscapes. Notably, this dataset was established in April 2024 as the foundation for testing pre-trained models in Vision Transformer technology.

    • The GEOEYE-70 is a carefully curated collection of Earth observation data from various satellites.
    • It offers high-resolution imagery, ideal for capturing detailed features of Earth's surface, including diverse landscapes and cityscapes.
    • This dataset was established in April 2024 as the foundation for testing pre-trained models in Vision Transformer technology.

    Overview The GEO dataset is a collection of high-resolution satellite and aerial imagery specifically designed for training and evaluating machine learning models for land use and land cover classification. It encompasses a diverse range of land cover types, including natural landscapes, man-made structures, and agricultural fields.

    Data Composition The GEO dataset consists of 69 classes, each containing 500 images sized (256 x 256 x 3) pixels (height, width, channels). This translates to a total of 34,500 images.

    GEOEYE-70 Dataset Understanding: - You can view the data details, publisher details, sources, and classes through the following GitHub repo: https://github.com/Ziad-o-Yusef/GEOEYE-70-Dataset - To enjoy viewing parts of this data without having to download it, you can follow this notebook:

    Or on Google Colab to show the complete data: https://colab.research.google.com/drive/1iXmoPcRdaULZa4dfGnwAgbtR6Wxd8hNu?usp=sharing

  8. R

    Accident Detection Model Dataset

    • universe.roboflow.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Accident detection model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Accident Bounding Boxes
    Description

    Accident-Detection-Model

    Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

    Problem Statement

    • Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.
    • According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.
    • The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

    Accidents survey

    https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

    Literature Survey

    • Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.
    • Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

    Research Gap

    • Lack of real-world data - We trained model for more then 3200 images.
    • Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.
    • Outdated Versions of previous works - We aer using Latest version of Yolo v8.

    Proposed methodology

    • We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.
    • This model after training with 25 iterations and is ready to detect an accident with a significant probability.

    Model Set-up

    Preparing Custom dataset

    • We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.
    • Then we annotated all of them individually on a tool called roboflow.
    • During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident
    • Then we divided the data set into train, val, test in the ratio of 8:1:1
    • At the final step we downloaded the dataset in yolov8 format.
      #### Using Google Collab
    • We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.
    • You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.
    • Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.
    • In Google collab, First of all we Changed runtime from TPU to GPU.
    • We cross checked it by running command ‘!nvidia-smi’
      #### Coding
    • First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’
    • Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’
    • Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’
    • Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’
    • After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’
    • Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’
    • The results are stored in the runs/detect/predict folder.
      Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

    Challenges I ran into

    I majorly ran into 3 problems while making this model

    • I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.
    • I was facing problem on cvat website because i was not sure what
  9. z

    The Cultural Resource Curse: How Trade Dependence Undermines Creative...

    • zenodo.org
    bin, csv
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anon Anon; Anon Anon (2025). The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries [Dataset]. http://doi.org/10.5281/zenodo.16784974
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Zenodo
    Authors
    Anon Anon; Anon Anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries. It contains country-year panel data for 2000–2023 covering both OECD economies and the ten largest Latin American countries by land area. Variables include GDP per capita (constant PPP, USD), trade openness, internet penetration, education indicators, cultural exports per capita, and executive constraints from the Polity V dataset.

    The dataset supports a comparative analysis of how economic structure, institutional quality, and infrastructure shape cultural export performance across development contexts. Within-country fixed effects models show that trade openness constrains cultural exports in OECD economies but has no measurable effect in resource-dependent Latin America. In contrast, strong executive constraints benefit cultural industries in advanced economies while constraining them in extraction-oriented systems. The results provide empirical evidence for a two-stage development framework in which colonial extraction legacies create distinct constraints on creative industry growth.

    All variables are harmonized to ISO3 country codes and aligned on a common panel structure. The dataset is fully reproducible using the included Jupyter notebooks (OECD.ipynb, LATAM+OECD.ipynb, cervantes.ipynb).

    Contents:

    • GDPPC.csv — GDP per capita series from the World Bank.

    • explanatory.csv — Trade openness, internet penetration, and education indicators.

    • culture_exports.csv — UNESCO cultural export data.

    • p5v2018.csv — Polity V institutional indicators.

    • Jupyter notebooks for data processing and replication.

    Potential uses: Comparative political economy, cultural economics, institutional development, and resource curse research.

    How to Run This Dataset and Code in Google Colab

    These steps reproduce the OECD vs. Latin America analyses from the paper using the provided CSVs and notebooks.

    1) Open Colab and set up

    1. Go to https://colab.research.google.com

    2. Click File → New notebook.

    3. (Optional) If your files are in Google Drive, mount it:

    python
    CopiarEditar
    from google.colab import drive drive.mount('/content/drive')

    2) Get the data files into Colab

    You have two easy options:

    A. Upload the 4 CSVs + notebooks directly

    • In the left sidebar, click the folder icon → Upload.

    • Upload: GDPPC.csv, explanatory.csv, culture_exports.csv, p5v2018.csv, and any .ipynb you want to run.

    B. Use Google Drive

    • Put those files in a Drive folder.

    • After mounting Drive, refer to them with paths like /content/drive/MyDrive/your_folder/GDPPC.csv.

  10. Data from: Stock Market Indicators

    • kaggle.com
    zip
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Wilf (2020). Stock Market Indicators [Dataset]. https://www.kaggle.com/abwilf/stock-market-indicators
    Explore at:
    zip(23262 bytes)Available download formats
    Dataset updated
    Jan 31, 2020
    Authors
    Alex Wilf
    Description

    Quickstart

    https://colab.research.google.com/drive/1W6TprjcxOdXsNwswkpm_XX2U_xld9_zZ#offline=true&sandboxMode=true

    Context

    Predicting the stock market is a game as old as the stock market itself. On popular ML platforms like Kaggle, users often compete to come up with highly nuanced, optimized models to solve the stock market starting just from price data. LSTMs may end up being the most effective model, but the real problem isn't the model - it's the data.

    Human and algorithmic traders in the financial industry know this, and augment their datasets with lots of useful information about stocks called "technical indicators". These indicators have fancy sounding names - e.g. the "Aroon Oscillator" and the "Chaikin Money Flow Index", but most boil down to simple calculations involving moving averages and volatility. Access to these indicators is unrestricted for humans (you can view them on most trading platforms), but access to well formatted indicators (csvs instead of visual lines) for large datasets reaching back significantly in time is nearly impossible to find. Even if you pay for a service, API usage limits make putting together such a dataset prohibitively expensive.

    The fact that this information is largely kept behind paywalls for large firms with proprietary resources makes me question the fairness of this market. With a data imbalance like this, how can a single trader - a daytrader - expect to make money? I wanted to make this data available to the ML community because it is my hope that bringing this data to the community will help to even the scales. Whether you're just looking to toy around and make a few bucks, or interested in contributing to something larger - a group of people working to develop algorithms to help the "little guy" trade - I hope this dataset will be helpful. To the best of my knowledge, this is the first dataset of its kind, but I hope it is not the last.

    Data

    Acknowledgements

    • The many online tutorials and specifications which helped me write and test the indicator functions
    • borismarjanovic for making public an amazing dataset that I use as a baseline for the colab notebook and the direct download file above
    • The many online services that have allowed me to download all the recent price information to augment Boris' dataset (which legally I cannot share, but which helped me develop the infrastructure to update the indicators given new prices data that I share in the quickstart and repo).

    Next Steps / Future Directions

    • Building inventive models using this dataset to more and more accurately predict stock price movements
    • Incorporating arbitrage analysis across stocks
    • Hedging
    • Options and selling short
    • Commodities, currencies, ETFs

    Collaboration

    If this interests you, reach out! My email is abwilf [at] umich [dot] edu. The repository I used to generate the dataset is here: https://github.com/abwilf/daytrader. I love forks. If you want to work on the project, send me a pull request!

  11. m

    Supplementary Material for Disruptive Solutions on Requirement Engineering...

    • data.mendeley.com
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Chaves (2022). Supplementary Material for Disruptive Solutions on Requirement Engineering for Agile Software Development: A tertiary study [Dataset]. http://doi.org/10.17632/5gb24nwk7y.1
    Explore at:
    Dataset updated
    Aug 1, 2022
    Authors
    James Chaves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository delivers the supplementary material for the paper: Disruptive Solutions on Requirement Engineering for Agile Software Development: A tertiary study.

    In the following, we present the abstract of the study:

    Context: Agile Software Development (ASD) is a disruptive process compared to traditional software development. Therefore, traditional Requirements Engineering (RE) forms may not be the best way to do RE for ASD (RE-ASD). Objective: Working with ASD using traditional RE ways could limit ASD's potential. Thus, it is necessary to investigate what academia and industry have done in RE to take full advantage of all of the capabilities of ASD beyond traditional RE. Method: We conducted a Tertiary Study looking for solutions for RE-ASD using the Systematic Literature Review (SLR) protocol described by Kitchenham and Charters. We then categorized the solutions into families using Targeted Coding and Constant Comparison, tools from Socio-Technical Grounded Theory (STGT). Afterward, we classified the solutions as disruptive using our model based on the Hype Level Curve concept, assessing their hype (popularity) in the software engineering community using Google Trends and Google Colab tools. Results: After executing the SLR protocol, we accepted 37 studies and encountered 136 solutions used by academia and industry for RE-ASD. We categorized these solutions into 21 solution families, six of which we classified as disruptive. Design Thinking (DT) and Artificial Intelligence (AI) were the two families of solutions that stood out the most. We also identified the type of solution (e.g., process, method, technique, tool, model, framework) and domain (academia or industry). Furthermore, we cataloged the challenges presented by the solutions. Conclusion: We concluded that only a few solutions that have been used for RE-ASD have the power to successfully challenge the mainstream Agile Software Development process by using innovation (26 out of 106). There is a gap between academia and industry regarding these disruptive solutions, and some challenges still need to be addressed in using these solutions. The repository contains the following: • Dataset from the Tertiary Study: o Data of the retrieved studies. It presents the classifications of the documents as 'Accepted,' 'Rejected' (with the indication of the step of the protocol the authors rejected the study), or 'Duplicated.' o Data of all solutions retrieved from the accepted studies • Socio-Technical Grounded Theory (STGT) tools o Result of the use of Targeted Coding and Constant Comparison • The Google Colab Notebook o Code in python o Results

  12. h

    Nemo-Base-V7-Tekken

    • huggingface.co
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spinax (2025). Nemo-Base-V7-Tekken [Dataset]. https://huggingface.co/datasets/NewEden-Forge/Nemo-Base-V7-Tekken
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset authored and provided by
    Spinax
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!

    We have a free Google Colab Tesla T4 notebook for Mistral Nemo 12b here: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing

      ✨ Finetune for Free
    

    All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

    Unsloth supports Free Notebooks… See the full description on the dataset page: https://huggingface.co/datasets/NewEden-Forge/Nemo-Base-V7-Tekken.

  13. LLM-generated essay using PaLM from Google Gen-AI

    • kaggle.com
    zip
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingki19 / Muhammad Rizqi (2023). LLM-generated essay using PaLM from Google Gen-AI [Dataset]. https://www.kaggle.com/datasets/kingki19/llm-generated-essay-using-palm-from-google-gen-ai/code
    Explore at:
    zip(519291 bytes)Available download formats
    Dataset updated
    Nov 8, 2023
    Authors
    Kingki19 / Muhammad Rizqi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In this competition LLM - Detect AI Generated Text, there's problem. There is an imbalance in the data which causes the LLM-generated essays to be less than the essays written by students. Here you can see EDA for the data competition in this notebook: AI or Not AI? Delving Into Essays with EDA

    So to solve this problem i made my own LLM-generated essay dataset that generated by PaLM to fix imbalance in data. You can see my work about how to generate the data in this Google Colaboratory:
    Generate LLM dataset using PaLM.ipynb

    Reason why i can't generated in Kaggle Notebook because Kaggle Notebook was use Kaggle Docker that can't use my own PaLM API. (My opinion)

    Column in data: - id: index - prompt_id: prompt that used to generated data. You can check the prompt here! - text: LLM-generated essay by PaLM based on prompt - generated: Additional information that shows it was LLM-generated. All value in this column is 1.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shreya Gupta (2025). Top Rated TV Shows [Dataset]. https://www.kaggle.com/datasets/shreyajii/top-rated-tv-shows
Organization logo

Top Rated TV Shows

Explore at:
54 scholarly articles cite this dataset (View in Google Scholar)
zip(314571 bytes)Available download formats
Dataset updated
Jan 5, 2025
Authors
Shreya Gupta
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset provides information about top-rated TV shows, collected from The Movie Database (TMDb) API. It can be used for data analysis, recommendation systems, and insights on popular television content.

Key Stats:

Total Pages: 109 Total Results: 2098 TV shows Data Source: TMDb API Sorting Criteria: Highest-rated by vote_average (average rating) with a minimum vote count of 200 Data Fields (Columns):

id: Unique identifier for the TV show name: Title of the TV show vote_average: Average rating given by users vote_count: Total number of votes received first_air_date: The date when the show was first aired original_language: Language in which the show was originally produced genre_ids: Genre IDs linked to the show's genres overview: A brief summary of the show popularity: Popularity score based on audience engagement poster_path: URL path for the show's poster image Accessing the Dataset via API (Python Example):

python Copy code import requests

api_key = 'YOUR_API_KEY_HERE' url = "https://api.themoviedb.org/3/discover/tv" params = { 'api_key': api_key, 'include_adult': 'false', 'language': 'en-US', 'page': 1, 'sort_by': 'vote_average.desc', 'vote_count.gte': 200 }

response = requests.get(url, params=params) data = response.json()

Display the first show

print(data['results'][0]) Dataset Use Cases:

Data Analysis: Explore trends in highly-rated TV shows. Recommendation Systems: Build personalized TV show suggestions. Visualization: Create charts to showcase ratings or genre distribution. Machine Learning: Predict show popularity using historical data. Exporting and Sharing the Dataset (Google Colab Example):

python Copy code import pandas as pd

Convert the API data to a DataFrame

df = pd.DataFrame(data['results'])

Save to CSV and upload to Google Drive

from google.colab import drive drive.mount('/content/drive') df.to_csv('/content/drive/MyDrive/top_rated_tv_shows.csv', index=False) Ways to Share the Dataset:

Google Drive: Upload and share a public link. Kaggle: Create a public dataset for collaboration. GitHub: Host the CSV file in a repository for easy sharing.

Search
Clear search
Close search
Google apps
Main menu