100+ datasets found
  1. College Student Placement Factors Dataset

    • kaggle.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Islam007
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 College Student Placement Dataset

    A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

    📄 Dataset Description

    This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

    The dataset is ideal for:

    • Predictive modeling of placement outcomes
    • Educational exercises in classification
    • Feature importance analysis
    • End-to-end machine learning projects

    📊 Columns Description

    Column NameDescription
    College_IDUnique ID of the college (e.g., CLG0001 to CLG0100)
    IQStudent’s IQ score (normally distributed around 100)
    Prev_Sem_ResultGPA from the previous semester (range: 5.0 to 10.0)
    CGPACumulative Grade Point Average (range: ~5.0 to 10.0)
    Academic_PerformanceAnnual academic rating (scale: 1 to 10)
    Internship_ExperienceWhether the student has completed any internship (Yes/No)
    Extra_Curricular_ScoreInvolvement in extracurriculars (score from 0 to 10)
    Communication_SkillsSoft skill rating (scale: 1 to 10)
    Projects_CompletedNumber of academic/technical projects completed (0 to 5)
    PlacementFinal placement result (Yes = Placed, No = Not Placed)

    🎯 Target Variable

    • Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

    🧠 Use Cases

    • 📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
    • 🔍 Exploratory Data Analysis (EDA)
    • 🎯 Feature Engineering and Selection
    • 🧪 Model Evaluation Practice
    • 👩‍🏫 Academic Projects & Capstone Use

    📦 Dataset Size

    • Rows: 10,000
    • Columns: 10
    • File Format: .csv

    📚 Context

    This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

    📜 License

    MIT

    🔗 Source

    Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

  2. e

    Vector CTRN 1:10,000 (1991-2005) — Listed points

    • data.europa.eu
    wms
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vector CTRN 1:10,000 (1991-2005) — Listed points [Dataset]. https://data.europa.eu/data/datasets/r_piemon-d20faf89-4a7b-4c2a-9508-e5bc93b9162a?locale=en
    Explore at:
    wmsAvailable download formats
    Description

    The dataset contains the listed points (on the ground or on buildings) extracted from the Regional Numerical Technical Charter (CTRN) on the scale 1:10,000 acquired by the Map Service of the Piedmont Region starting from air flights operated from 1991 to 2005.The data can be downloaded according to the cut of the Sheets at the 1:50,000 scale.

  3. Predictive Maintenance Dataset (AI4I 2020)

    • kaggle.com
    Updated Nov 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Matzka (2022). Predictive Maintenance Dataset (AI4I 2020) [Dataset]. https://www.kaggle.com/datasets/stephanmatzka/predictive-maintenance-dataset-ai4i-2020
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stephan Matzka
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns

    1. UID: unique identifier ranging from 1 to 10000
    2. product ID: consisting of a letter L, M, or H for low (50% of all products), medium (30%) and high (20%) as product quality variants and a variant-specific serial number
    3. type: just the product type L, M or H from column 2
    4. air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
    5. process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.
    6. rotational speed [rpm]: calculated from a power of 2860 W, overlaid with a normally distributed noise
    7. torque [Nm]: torque values are normally distributed around 40 Nm with a SD = 10 Nm and no negative values.
    8. tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.
    9. a 'machine failure' label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.

    The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.

    This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.

    The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/

  4. d

    DS926 Digital surfaces and thicknesses of selected hydrogeologic units of...

    • catalog.data.gov
    • search.dataone.org
    • +1more
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). DS926 Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system in Florida and parts of Georgia, Alabama, and South Carolina -- Points and control points for the top of the 10,000 mg/L total dissolved solids boundary [Dataset]. https://catalog.data.gov/dataset/ds926-digital-surfaces-and-thicknesses-of-selected-hydrogeologic-units-of-the-floridan-aqu-468c9
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Florida, Floridan aquifer
    Description

    Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system were developed to define an updated hydrogeologic framework as part of the U.S. Geological Survey Groundwater Resources Program. This feature class contains data points used to generate est_10000_TDS raster. It also includes "control" points used to map the 10,000 boundary including time-domeain electromagnetic soundings, data source is written communication from Pat Burger, St. Johns River, Water Managment District, 2013 and from other sources.

  5. Texas Gravity Data (P199841), gravity point data

    • ecat.ga.gov.au
    • researchdata.edu.au
    Updated Jul 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2021). Texas Gravity Data (P199841), gravity point data [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/4adb6680-60c3-42e3-bb1b-5e25afb8c301
    Explore at:
    www:link-1.0-http--link, www:link-1.0-http--opendapAvailable download formats
    Dataset updated
    Jul 5, 2021
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Time period covered
    Jan 1, 1998 - Dec 31, 1998
    Area covered
    Description

    Gravity data measures small changes in gravity due to changes in the density of rocks beneath the Earth's surface. The data collected are processed via standard methods to ensure the response recorded is that due only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose. This Texas Gravity Data (P199841) contains a total of 2529 point data values acquired at a spacing between 2000 and 10000 metres. The data is located in QLD and were acquired in 1998, under project No. 199841 for Geological Survey of Queensland (GSQ).

  6. H

    Data from: We Just Ran Twenty-Three Million Queries of the World Bank's Web...

    • dataverse.harvard.edu
    application/x-stata +3
    Updated Apr 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2014). We Just Ran Twenty-Three Million Queries of the World Bank's Web Site [Dataset]. http://doi.org/10.7910/DVN/25492
    Explore at:
    application/x-stata(4240905), text/plain; charset=us-ascii(1426), text/plain; charset=us-ascii(4794143), zip(41100329), text/plain; charset=us-ascii(10704), text/x-stata-syntax; charset=us-ascii(8834), application/x-stata(19242087), application/x-stata(72802087), text/x-stata-syntax; charset=us-ascii(8562), text/plain; charset=us-ascii(2774), application/x-stata(138842087), text/x-stata-syntax; charset=us-ascii(6737), application/x-stata(25482087), text/plain; charset=us-ascii(32875), text/plain; charset=us-ascii(139802), application/x-stata(69162087), text/plain; charset=us-ascii(156132), application/x-stata(164322087), text/x-stata-syntax; charset=us-ascii(1412), application/x-stata(215246)Available download formats
    Dataset updated
    Apr 27, 2014
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1977 - 2012
    Area covered
    World
    Description

    This study provides data from the World Bank's PovcalNet on the distribution of household income and consumption across populations for 942 country-years, organized in dta and csv files by region. Each distribution contains 10,000 data points, one for each 0.01 incremental increase in percent of people living in households at or below a given income or consumption level. In addition, a data set containing the estimated parameters of the Beta and General Quadratic Lorenz curves is provided. For reference, we also provide the Python scripts used to query the PovcalNet online tool and export data from the Mongo database used to store results of these queries, along with all do files used to clean and construct the final data sets and summary statistics.

  7. Car Prices Market

    • kaggle.com
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammed Zidan (2023). Car Prices Market [Dataset]. https://www.kaggle.com/datasets/muhammedzidan/car-prices-market/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset provided by
    Kaggle
    Authors
    Muhammed Zidan
    Description

    ABOUT CAR PRICES MARKET

    This dataset provides a comprehensive list of OLD and NEW car prices in the market, with information on various factors such as car make, year, model, transmission type, and more. With over 10,000 data points, this dataset allows for in-depth analysis and exploration of the dynamics of car prices in the market, making it a valuable resource for researchers, analysts, and car enthusiasts alike.

    Content:

    Used Cars Prices

    Here you find 78612 records about used cars: 60 Brand, 382 Model, 33 Modelyear, 1839 CarModel, 1397 AveragePrice, 893 MinimumPrice, 916 MaximumPrice, over 128 Months/Year.

    New Cars Prices

    Here you find 3433 records about new cars: 1119 OldPrice, 410 ChangValue, 1162 NewPrice with 268 ChangeDate, on 49 Brand, 178 Model, over 4 Years

    You can use this data to practice:

    1. Data cleaning
    2. Data Analysis
    3. Data visualization
    4. Machine Learning Price Forecasting

    Inspiration About Dataset:

    1- Price Prediction: The dataset contains information about various car models, such as their brand, model, year, fuel type, and transmission. This information can be used to predict the price of a car using regression models.

    2- Brand Analysis: The dataset contains information about the brand of each car. You can analyze the dataset to see which brand has the highest average price.

    3- Transmission Analysis: You can analyze the dataset to see how the price of a car varies with transmission type. For example, you can see if cars with automatic transmissions have a higher or lower price than cars with manual transmissions.

    Question To Answer:

    1. Which car brand has the highest average price?
    2. Which fuel type has the highest average price?
    3. How does the price of a car vary with its age?
    4. Which transmission type is more popular among car buyers?
    5. What is the distribution of car prices across different car brands?
    6. Which car brand has the highest resale value?
    7. How does the price of a car vary with its condition (i.e., new vs. used)?
    8. Is there a relationship between the price of a car and its brand?
    9. Which car brand has the highest rate of electric or fuel cars?
    10. Can we predict the fuel efficiency of a car based on its features?
  8. e

    Hydrogeological map 1:10,000 (points)

    • data.europa.eu
    Updated Oct 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Hydrogeological map 1:10,000 (points) [Dataset]. https://data.europa.eu/data/datasets/c_d020-c0502013_cartaidrogeologp
    Explore at:
    Dataset updated
    Oct 15, 2020
    Description

    Hydrogeological map 1:10,000 (points)

  9. T

    deep1b

    • tensorflow.org
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). deep1b [Dataset]. https://www.tensorflow.org/datasets/catalog/deep1b
    Explore at:
    Dataset updated
    Sep 3, 2024
    Description

    Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:

    1. 'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).
    2. 'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('deep1b', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  10. r

    Windmill Islands 1:10000 Profiles and Transects GIS Dataset

    • researchdata.edu.au
    Updated Dec 12, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Antarctic Division (2015). Windmill Islands 1:10000 Profiles and Transects GIS Dataset [Dataset]. https://researchdata.edu.au/windmill-islands-110000-gis-dataset/3530787
    Explore at:
    Dataset updated
    Dec 12, 2015
    Dataset provided by
    data.gov.au
    Authors
    Australian Antarctic Division
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    This dataset is one of a number of datasets containing geomorphological data relating to the Windmill Islands, Wilkes Land, Antarctica. The dataset comprises of a digital point coverage which is linked to a seperate digital data base (ie attribute tables)in which attributes are assigned to topographic profiles and transects and to the respective samples represented along these profiles. The coverage has been built for lines and points and attribute tables profile.aat and profile.pat assigned the following items respectively :

    profile_name, descript, descript1, descript2, descript3 and profile.pat :

    profile_name, site, s_elev, br_elev, s_elev_source, br_elev_source, s_elev_qual, br_elev_qual. Does not conform to Geoscience Australia's Data Dictionary as too detailed.

    These data were compiled by Dr Ian D Goodwin from his own field notes and from the records of other workers. See the linked document at the URL below for further information.

  11. s

    Predictive Maintenance - Dataset - Asset Explorer

    • mdep.smdh.uk
    Updated Mar 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Predictive Maintenance - Dataset - Asset Explorer [Dataset]. https://mdep.smdh.uk/dataset/the-data-lab--predictive-maintenance
    Explore at:
    Dataset updated
    Mar 6, 2023
    Description

    This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns

  12. T

    sift1m

    • tensorflow.org
    • huggingface.co
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). sift1m [Dataset]. https://www.tensorflow.org/datasets/catalog/sift1m
    Explore at:
    Dataset updated
    Sep 3, 2024
    Description

    Pre-trained embeddings for approximate nearest neighbor search using the Euclidean distance. This dataset consists of two splits:

    1. 'database': consists of 1,000,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (empty list).
    2. 'test': consists of 10,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('sift1m', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. Training dataset for the TRENDY method

    • zenodo.org
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Wang; Yue Wang (2025). Training dataset for the TRENDY method [Dataset]. http://doi.org/10.1101/2024.10.14.618189
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yue Wang; Yue Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 14, 2024
    Description

    This dataset is used for training the TRENDY method for gene regulatory network inference. It also contains the SINC test data set.

    For a brief description of the code for TRENDY method, see https://github.com/YueWangMathbio/TRENDY.

    See https://github.com/YueWangMathbio/TRENDY/blob/main/GRN_transformer.pdf for the manuscript of TRENDY method.

    To use the data:

    1 download all files from https://github.com/YueWangMathbio/TRENDY

    2 download all files from this database (both https://zenodo.org/records/14927741 and https://zenodo.org/records/13929908)

    3 in the folder with all files from GitHub, creat a folder named "total_data_10", and unzip all files with name "dataset....zip" in this folder

    4 unzip "rev_wendy_all_10.zip" in the folder with all files from GitHub

    5 unzip "SINC_data.zip", and the files into the folder "SINC"

    The "total_data_10" folder will contain 102 groups of data, where each group has eight files with different name endings:

    xxx_A: 1000 ground truth gene regulatory networks, each of size 10*10

    xxx_cov: 11000 covariance matrices for 1000 samples at 11 time points, each of size 10*10

    xxx_data: 1000 gene expression samples, each of size 100*10*11 (100 cells, 10 genes, 11 time points)

    xxx_genie: 10000 inferred gene regulatory networks by GENIE3 method for 1000 samples at 10 time points, each of size 10*10

    xxx_nlode: 1000 inferred gene regulatory networks by NonlinearODEs method for 1000 samples, each of size 10*10

    xxx_revcov: 10000 constructed pseudo covariance matrices for 1000 samples at 10 time points, each of size 10*10

    xxx_sinc:1000 inferred gene regulatory networks by SINCERITIES method for 1000 samples, each of size 10*10

    xxx_wendy: 10000 inferred gene regulatory networks by WENDY method for 1000 samples at 10 time points, each of size 10*10

    The "rev_wendy_all_10" folder will contain two groups of data, where each group has eight files with different name endings:

    xxx_ktstar: 10000 inferred covariance matrices by the first half of TRENDY for 1000 samples at 10 time points, each of size 10*10

    xxx_revwendy: 10000 inferred gene regulatory networks by the first half of TRENDY for 1000 samples at 10 time points, each of size 10*10

    The first 100 group with numbering are for training. The one group with "val" is for validation. The one group with "test" is for testing.

    If you want to train or test new GRN inference methods, then just use the xxx_A files and xxx_data files.

  14. h

    CFD

    • huggingface.co
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allanatrix (2025). CFD [Dataset]. https://huggingface.co/datasets/Allanatrix/CFD
    Explore at:
    Dataset updated
    Jun 16, 2025
    Authors
    Allanatrix
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Navier-Stokes Simulated Flow Dataset for PINNs

      Welcome to the Dataset!
    

    Dive into the dynamic world of fluid flow with the Navier-Stokes Simulated Flow Dataset for PINNs! This collection of 10,000 simulated data points captures the essence of fluid dynamics in a 2D channel, tailored specifically for training Physics-Informed Neural Networks (PINNs). With an even split of 5,000 laminar flow and 5,000 turbulent flow points, this dataset is perfect for researchers, data… See the full description on the dataset page: https://huggingface.co/datasets/Allanatrix/CFD.

  15. Popular movies dataset

    • kaggle.com
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajan (2025). Popular movies dataset [Dataset]. https://www.kaggle.com/datasets/rajansavaliya22/popular-movies-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rajan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This file contain metadata for 10,000 movies. The dataset consists of movies released on, or before June 2025. Data points include Movie Title, Tmdb id, original language, Genres, Release date, Revenue, budget, Runtime and Overview of movie.

    Content

    This dataset consists of following files:

    popular_movies.csv: Contains information about movies i.e. title, tmdb id, original_language, genres, release date, revenue, budget, runtime and overview.

    credits.csv: This file details about casts in movie and crew members worked on movie.

    Acknowledgements

    This dataset is an ensemble of data collected from TMDB. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

  16. C

    Geological database, 1:10.000 - Geomorphological and anthropic elements...

    • ckan.mobidatalab.eu
    wfs, wms, zip
    Updated Apr 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoDatiGovIt RNDT (2023). Geological database, 1:10.000 - Geomorphological and anthropic elements (points) - 10k [Dataset]. https://ckan.mobidatalab.eu/dataset/geological-database-1-10-000-geomorphological-and-anthropic-elements-10k-points
    Explore at:
    wfs, wms, zipAvailable download formats
    Dataset updated
    Apr 27, 2023
    Dataset provided by
    GeoDatiGovIt RNDT
    Description

    Georeferenced vector type database, containing the geomorphological and anthropic elements in punctual form of the mountain regional territory, surveyed at the acquisition scale 1:10,000. The geographical area covered includes the regional Apennine territory.

  17. h

    MedSynth

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad Rezaie (2025). MedSynth [Dataset]. https://huggingface.co/datasets/Ahmad0067/MedSynth
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Ahmad Rezaie
    Description

    Dataset Card for MedSynth

    The MedSynth dataset contains synthetic medical dialogue–note pairs developed for the medical dialogue-to-note summarization task.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The dataset covers 2000 ICD-10 codes, with five data points per code, resulting in a total of more than 10,000 data points. The notes are in SOAP format.

      Uses
    

    MedSynth should not be used as a reliable source of medical information. It is intended solely to… See the full description on the dataset page: https://huggingface.co/datasets/Ahmad0067/MedSynth.

  18. D

    Damped pendulum for nonlinear system identification - inputs are sampled...

    • darus.uni-stuttgart.de
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Frank (2025). Damped pendulum for nonlinear system identification - inputs are sampled from a multivariate-normal distribution - synthetically generated [Dataset]. http://doi.org/10.18419/DARUS-4770
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    DaRUS
    Authors
    Daniel Frank
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    Overview This dataset contains input-output data of a damped nonlinear pendulum that is actuated at the mounting point. The data was generated with statesim [1], a python package for simulating linear and nonlinear ODEs, for the system actuated pendulum. The configuration .json files for the corresponding datasets (in-distribution and out-of-distribution) can be found in the respective folders. After creating the dataset, the files are stored in the raw folder. Then, they are split into subsets for training, testing, and validation and can be found in the processed folder; details about the splitting are found in the config.json file. The dataset can be used to test system identification algorithms and methods that aim to identify nonlinear dynamics from input-output measurements. The training dataset is used to optimize the model parameters, the validation set for hyperparameter optimization, and the test set only for the final evaluation. In [2], the authors used the same underlying dynamics to create their dataset but without damping terms. Input generation Input trajectories are sampled from a multivariate-normal distribution. Noise Gaussian white noise of approximately 30dB is added at the output. Statistics The input and output size is one. In-distribution data: 2 100 000 data points Training: 10 000 trajectories of length 150 Validation: 2 000 trajectories of length 150 Test: 2 000 trajectories of length 150 Out-of-distribution data: 7 times 100 000 data points 7 different datasets were only used for testing. Each dataset contains 200 trajectories of length 500. References Frank, D. statesim [Computer software]. https://github.com/Dany-L/statesim Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3), 218-229.

  19. Gravity Survey (P198089), gravity point data

    • ecat.ga.gov.au
    • researchdata.edu.au
    Updated Jul 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2021). Gravity Survey (P198089), gravity point data [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/e6837759-09d9-4a13-bfb1-23ab50916e33
    Explore at:
    www:link-1.0-http--link, www:link-1.0-http--opendapAvailable download formats
    Dataset updated
    Jul 5, 2021
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Time period covered
    Jan 1, 1980 - Dec 31, 1980
    Area covered
    Description

    Gravity data measures small changes in gravity due to changes in the density of rocks beneath the Earth's surface. The data collected are processed via standard methods to ensure the response recorded is that due only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose. This Gravity Survey (P198089) contains a total of 461 point data values acquired at a spacing between 450 and 10000 metres. The data is located in SA and were acquired in 1980, under project No. 198089 for None.

  20. d

    Crypto 30 Minute Price Data (VWAP)| 10,000 Cryptocurrency Tickers | +65 DEX...

    • datarade.ai
    .json
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blocksize (2024). Crypto 30 Minute Price Data (VWAP)| 10,000 Cryptocurrency Tickers | +65 DEX & CEX | No Rate Limits [Dataset]. https://datarade.ai/data-products/crypto-30-minute-price-data-vwap-10-000-cryptocurrency-tic-blocksize
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Apr 13, 2024
    Dataset authored and provided by
    Blocksize
    Area covered
    Thailand, Guadeloupe, Iran (Islamic Republic of), Wallis and Futuna, Greece, Italy, Senegal, Saint Martin (French part), Malaysia, British Indian Ocean Territory
    Description

    Access our data for free: https://matrix.blocksize.capital/auth/open/sign-up

    The Blocksize 30-Minute VWAP Feed provides precise, time-anchored pricing snapshots for digital assets, updated every 30 minutes around the clock. Designed for use cases where regular and unbiased price reference points are essential — such as portfolio valuation, fund NAV calculation, settlement, or compliance reporting — this feed offers volume-weighted average prices based on executed trades across a broad and continuously vetted set of exchanges.

    Each pricing point is calculated using trade data observed during the 30-minute interval immediately preceding each half-hour mark (e.g., 00:30, 01:00, 01:30 UTC, etc.). For each interval, the final price is derived from the volume-weighted average of the last trade events on all reporting exchanges. This method ensures that higher-volume trades contribute more significantly to the resulting price, offering a fair and liquidity-sensitive reflection of market value.

    To ensure accuracy and data integrity, only validated trade events with complete volume, price, and timestamp information are considered. Any incomplete, malformed, or delayed exchange data is automatically excluded from the calculation. In the rare event that no valid data is available for a given interval, the feed defaults to the last available valid price to preserve pricing continuity — a critical feature for settlement systems and automated pipelines.

    The feed also benefits from active oversight and quality assurance by Blocksize’s internal data committee. Exchanges that show recurring anomalies or inconsistencies are removed from the input set until verified corrections are made, while new sources are added only after rigorous integrity checks. This combination of automation, governance, and data hygiene ensures that the 30-minute VWAP feed remains a trusted pricing oracle for digital asset markets, even during volatile or low-liquidity periods.

    Our Customers:

    • Oracles & DeFi Protocols and Applications
    • Asset & Fund Managers investing in digital assets
    • Asset Custodians storing digital assets
    • Banks, Brokers with crypto offering
    • Traditional Data Providers planning to extend their offering to digital assets
    • Information Provider platforms

    Questions? Reach out to our qualified data team.

    PII Statement: Our datasets does not include personal, pseudonymized, or sensitive user data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset
Organization logo

College Student Placement Factors Dataset

College Student Placement Dataset (10,000 Rows | Realistic Simulation)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sahil Islam007
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📘 College Student Placement Dataset

A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

📄 Dataset Description

This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

The dataset is ideal for:

  • Predictive modeling of placement outcomes
  • Educational exercises in classification
  • Feature importance analysis
  • End-to-end machine learning projects

📊 Columns Description

Column NameDescription
College_IDUnique ID of the college (e.g., CLG0001 to CLG0100)
IQStudent’s IQ score (normally distributed around 100)
Prev_Sem_ResultGPA from the previous semester (range: 5.0 to 10.0)
CGPACumulative Grade Point Average (range: ~5.0 to 10.0)
Academic_PerformanceAnnual academic rating (scale: 1 to 10)
Internship_ExperienceWhether the student has completed any internship (Yes/No)
Extra_Curricular_ScoreInvolvement in extracurriculars (score from 0 to 10)
Communication_SkillsSoft skill rating (scale: 1 to 10)
Projects_CompletedNumber of academic/technical projects completed (0 to 5)
PlacementFinal placement result (Yes = Placed, No = Not Placed)

🎯 Target Variable

  • Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

🧠 Use Cases

  • 📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
  • 🔍 Exploratory Data Analysis (EDA)
  • 🎯 Feature Engineering and Selection
  • 🧪 Model Evaluation Practice
  • 👩‍🏫 Academic Projects & Capstone Use

📦 Dataset Size

  • Rows: 10,000
  • Columns: 10
  • File Format: .csv

📚 Context

This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

📜 License

MIT

🔗 Source

Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

Search
Clear search
Close search
Google apps
Main menu