100+ datasets found

College Student Placement Factors Dataset

kaggle.com

Updated Jul 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sahil Islam007

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📘 College Student Placement Dataset

A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

📄 Dataset Description

This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

The dataset is ideal for:

Predictive modeling of placement outcomes
Educational exercises in classification
Feature importance analysis
End-to-end machine learning projects

📊 Columns Description

Column Name	Description
College_ID	Unique ID of the college (e.g., CLG0001 to CLG0100)
IQ	Student’s IQ score (normally distributed around 100)
Prev_Sem_Result	GPA from the previous semester (range: 5.0 to 10.0)
CGPA	Cumulative Grade Point Average (range: ~5.0 to 10.0)
Academic_Performance	Annual academic rating (scale: 1 to 10)
Internship_Experience	Whether the student has completed any internship (Yes/No)
Extra_Curricular_Score	Involvement in extracurriculars (score from 0 to 10)
Communication_Skills	Soft skill rating (scale: 1 to 10)
Projects_Completed	Number of academic/technical projects completed (0 to 5)
Placement	Final placement result (Yes = Placed, No = Not Placed)

🎯 Target Variable

Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

🧠 Use Cases

📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
🔍 Exploratory Data Analysis (EDA)
🎯 Feature Engineering and Selection
🧪 Model Evaluation Practice
👩‍🏫 Academic Projects & Capstone Use

📦 Dataset Size

Rows: 10,000
Columns: 10
File Format: .csv

📚 Context

This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

📜 License

MIT

🔗 Source

Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

e
Vector CTRN 1:10,000 (1991-2005) — Listed points
data.europa.eu
wms
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vector CTRN 1:10,000 (1991-2005) — Listed points [Dataset]. https://data.europa.eu/data/datasets/r_piemon-d20faf89-4a7b-4c2a-9508-e5bc93b9162a?locale=en
Explore at:
wmsAvailable download formats
Description
The dataset contains the listed points (on the ground or on buildings) extracted from the Regional Numerical Technical Charter (CTRN) on the scale 1:10,000 acquired by the Map Service of the Piedmont Region starting from air flights operated from 1991 to 2005.The data can be downloaded according to the cut of the Sheets at the 1:50,000 scale.
Predictive Maintenance Dataset (AI4I 2020)
kaggle.com
Updated Nov 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephan Matzka (2022). Predictive Maintenance Dataset (AI4I 2020) [Dataset]. https://www.kaggle.com/datasets/stephanmatzka/predictive-maintenance-dataset-ai4i-2020
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stephan Matzka
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns

UID: unique identifier ranging from 1 to 10000

product ID: consisting of a letter L, M, or H for low (50% of all products), medium (30%) and high (20%) as product quality variants and a variant-specific serial number

type: just the product type L, M or H from column 2

air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K

process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.

rotational speed [rpm]: calculated from a power of 2860 W, overlaid with a normally distributed noise

torque [Nm]: torque values are normally distributed around 40 Nm with a SD = 10 Nm and no negative values.

tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.

a 'machine failure' label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.

The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.

This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.

The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/
d
DS926 Digital surfaces and thicknesses of selected hydrogeologic units of...
catalog.data.gov
search.dataone.org
+1more
Updated Nov 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). DS926 Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system in Florida and parts of Georgia, Alabama, and South Carolina -- Points and control points for the top of the 10,000 mg/L total dissolved solids boundary [Dataset]. https://catalog.data.gov/dataset/ds926-digital-surfaces-and-thicknesses-of-selected-hydrogeologic-units-of-the-floridan-aqu-468c9
Explore at:
Dataset updated
Nov 1, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Florida, Floridan aquifer
Description
Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system were developed to define an updated hydrogeologic framework as part of the U.S. Geological Survey Groundwater Resources Program. This feature class contains data points used to generate est_10000_TDS raster. It also includes "control" points used to map the 10,000 boundary including time-domeain electromagnetic soundings, data source is written communication from Pat Burger, St. Johns River, Water Managment District, 2013 and from other sources.
Texas Gravity Data (P199841), gravity point data
ecat.ga.gov.au
researchdata.edu.au
Updated Jul 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2021). Texas Gravity Data (P199841), gravity point data [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/4adb6680-60c3-42e3-bb1b-5e25afb8c301
Explore at:
www:link-1.0-http--link, www:link-1.0-http--opendapAvailable download formats
Dataset updated
Jul 5, 2021
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Time period covered
Jan 1, 1998 - Dec 31, 1998
Area covered

Description
Gravity data measures small changes in gravity due to changes in the density of rocks beneath the Earth's surface. The data collected are processed via standard methods to ensure the response recorded is that due only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose. This Texas Gravity Data (P199841) contains a total of 2529 point data values acquired at a spacing between 2000 and 10000 metres. The data is located in QLD and were acquired in 1998, under project No. 199841 for Geological Survey of Queensland (GSQ).
H
Data from: We Just Ran Twenty-Three Million Queries of the World Bank's Web...
dataverse.harvard.edu
application/x-stata +3
Updated Apr 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2014). We Just Ran Twenty-Three Million Queries of the World Bank's Web Site [Dataset]. http://doi.org/10.7910/DVN/25492
Explore at:
application/x-stata(4240905), text/plain; charset=us-ascii(1426), text/plain; charset=us-ascii(4794143), zip(41100329), text/plain; charset=us-ascii(10704), text/x-stata-syntax; charset=us-ascii(8834), application/x-stata(19242087), application/x-stata(72802087), text/x-stata-syntax; charset=us-ascii(8562), text/plain; charset=us-ascii(2774), application/x-stata(138842087), text/x-stata-syntax; charset=us-ascii(6737), application/x-stata(25482087), text/plain; charset=us-ascii(32875), text/plain; charset=us-ascii(139802), application/x-stata(69162087), text/plain; charset=us-ascii(156132), application/x-stata(164322087), text/x-stata-syntax; charset=us-ascii(1412), application/x-stata(215246)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/25492
Dataset updated
Apr 27, 2014
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1977 - 2012
Area covered
World
Description
This study provides data from the World Bank's PovcalNet on the distribution of household income and consumption across populations for 942 country-years, organized in dta and csv files by region. Each distribution contains 10,000 data points, one for each 0.01 incremental increase in percent of people living in households at or below a given income or consumption level. In addition, a data set containing the estimated parameters of the Beta and General Quadratic Lorenz curves is provided. For reference, we also provide the Python scripts used to query the PovcalNet online tool and export data from the Mongo database used to store results of these queries, along with all do files used to clean and construct the final data sets and summary statistics.
Car Prices Market
kaggle.com
Updated Apr 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammed Zidan (2023). Car Prices Market [Dataset]. https://www.kaggle.com/datasets/muhammedzidan/car-prices-market/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2023
Dataset provided by
Kaggle
Authors
Muhammed Zidan
Description
ABOUT CAR PRICES MARKET

This dataset provides a comprehensive list of OLD and NEW car prices in the market, with information on various factors such as car make, year, model, transmission type, and more. With over 10,000 data points, this dataset allows for in-depth analysis and exploration of the dynamics of car prices in the market, making it a valuable resource for researchers, analysts, and car enthusiasts alike.

Content:

Used Cars Prices

Here you find 78612 records about used cars: 60 Brand, 382 Model, 33 Modelyear, 1839 CarModel, 1397 AveragePrice, 893 MinimumPrice, 916 MaximumPrice, over 128 Months/Year.

New Cars Prices

Here you find 3433 records about new cars: 1119 OldPrice, 410 ChangValue, 1162 NewPrice with 268 ChangeDate, on 49 Brand, 178 Model, over 4 Years

You can use this data to practice:

Data cleaning

Data Analysis

Data visualization

Machine Learning Price Forecasting

Inspiration About Dataset:

1- Price Prediction: The dataset contains information about various car models, such as their brand, model, year, fuel type, and transmission. This information can be used to predict the price of a car using regression models.

2- Brand Analysis: The dataset contains information about the brand of each car. You can analyze the dataset to see which brand has the highest average price.

3- Transmission Analysis: You can analyze the dataset to see how the price of a car varies with transmission type. For example, you can see if cars with automatic transmissions have a higher or lower price than cars with manual transmissions.

Question To Answer:

Which car brand has the highest average price?

Which fuel type has the highest average price?

How does the price of a car vary with its age?

Which transmission type is more popular among car buyers?

What is the distribution of car prices across different car brands?

Which car brand has the highest resale value?

How does the price of a car vary with its condition (i.e., new vs. used)?

Is there a relationship between the price of a car and its brand?

Which car brand has the highest rate of electric or fuel cars?

Can we predict the fuel efficiency of a car based on its features?
e
Hydrogeological map 1:10,000 (points)
data.europa.eu
Updated Oct 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Hydrogeological map 1:10,000 (points) [Dataset]. https://data.europa.eu/data/datasets/c_d020-c0502013_cartaidrogeologp
Explore at:
Dataset updated
Oct 15, 2020
Description
Hydrogeological map 1:10,000 (points)
T
deep1b
tensorflow.org
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). deep1b [Dataset]. https://www.tensorflow.org/datasets/catalog/deep1b
Explore at:
Dataset updated
Sep 3, 2024
Description
Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:

'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).

'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('deep1b', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
r
Windmill Islands 1:10000 Profiles and Transects GIS Dataset
researchdata.edu.au
Updated Dec 12, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Antarctic Division (2015). Windmill Islands 1:10000 Profiles and Transects GIS Dataset [Dataset]. https://researchdata.edu.au/windmill-islands-110000-gis-dataset/3530787
Explore at:
Dataset updated
Dec 12, 2015
Dataset provided by
data.gov.au
Authors
Australian Antarctic Division
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
This dataset is one of a number of datasets containing geomorphological data relating to the Windmill Islands, Wilkes Land, Antarctica. The dataset comprises of a digital point coverage which is linked to a seperate digital data base (ie attribute tables)in which attributes are assigned to topographic profiles and transects and to the respective samples represented along these profiles. The coverage has been built for lines and points and attribute tables profile.aat and profile.pat assigned the following items respectively :

profile_name, descript, descript1, descript2, descript3 and profile.pat :

profile_name, site, s_elev, br_elev, s_elev_source, br_elev_source, s_elev_qual, br_elev_qual. Does not conform to Geoscience Australia's Data Dictionary as too detailed.

These data were compiled by Dr Ian D Goodwin from his own field notes and from the records of other workers. See the linked document at the URL below for further information.
s
Predictive Maintenance - Dataset - Asset Explorer
mdep.smdh.uk
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Predictive Maintenance - Dataset - Asset Explorer [Dataset]. https://mdep.smdh.uk/dataset/the-data-lab--predictive-maintenance
Explore at:
Dataset updated
Mar 6, 2023
Description
This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns
T
sift1m
tensorflow.org
huggingface.co
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). sift1m [Dataset]. https://www.tensorflow.org/datasets/catalog/sift1m
Explore at:
Dataset updated
Sep 3, 2024
Description
Pre-trained embeddings for approximate nearest neighbor search using the Euclidean distance. This dataset consists of two splits:

'database': consists of 1,000,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (empty list).

'test': consists of 10,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('sift1m', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Training dataset for the TRENDY method
zenodo.org
zip
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Wang; Yue Wang (2025). Training dataset for the TRENDY method [Dataset]. http://doi.org/10.1101/2024.10.14.618189
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1101/2024.10.14.618189
Dataset updated
Feb 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yue Wang; Yue Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 14, 2024
Description
This dataset is used for training the TRENDY method for gene regulatory network inference. It also contains the SINC test data set.

For a brief description of the code for TRENDY method, see https://github.com/YueWangMathbio/TRENDY.

See https://github.com/YueWangMathbio/TRENDY/blob/main/GRN_transformer.pdf for the manuscript of TRENDY method.

To use the data:

1 download all files from https://github.com/YueWangMathbio/TRENDY

2 download all files from this database (both https://zenodo.org/records/14927741 and https://zenodo.org/records/13929908)

3 in the folder with all files from GitHub, creat a folder named "total_data_10", and unzip all files with name "dataset....zip" in this folder

4 unzip "rev_wendy_all_10.zip" in the folder with all files from GitHub

5 unzip "SINC_data.zip", and the files into the folder "SINC"

The "total_data_10" folder will contain 102 groups of data, where each group has eight files with different name endings:

xxx_A: 1000 ground truth gene regulatory networks, each of size 10*10

xxx_cov: 11000 covariance matrices for 1000 samples at 11 time points, each of size 10*10

xxx_data: 1000 gene expression samples, each of size 100*10*11 (100 cells, 10 genes, 11 time points)

xxx_genie: 10000 inferred gene regulatory networks by GENIE3 method for 1000 samples at 10 time points, each of size 10*10

xxx_nlode: 1000 inferred gene regulatory networks by NonlinearODEs method for 1000 samples, each of size 10*10

xxx_revcov: 10000 constructed pseudo covariance matrices for 1000 samples at 10 time points, each of size 10*10

xxx_sinc:1000 inferred gene regulatory networks by SINCERITIES method for 1000 samples, each of size 10*10

xxx_wendy: 10000 inferred gene regulatory networks by WENDY method for 1000 samples at 10 time points, each of size 10*10

The "rev_wendy_all_10" folder will contain two groups of data, where each group has eight files with different name endings:

xxx_ktstar: 10000 inferred covariance matrices by the first half of TRENDY for 1000 samples at 10 time points, each of size 10*10

xxx_revwendy: 10000 inferred gene regulatory networks by the first half of TRENDY for 1000 samples at 10 time points, each of size 10*10

The first 100 group with numbering are for training. The one group with "val" is for validation. The one group with "test" is for testing.

If you want to train or test new GRN inference methods, then just use the xxx_A files and xxx_data files.
h
CFD
huggingface.co
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allanatrix (2025). CFD [Dataset]. https://huggingface.co/datasets/Allanatrix/CFD
Explore at:
Dataset updated
Jun 16, 2025
Authors
Allanatrix
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Navier-Stokes Simulated Flow Dataset for PINNs

Welcome to the Dataset!

Dive into the dynamic world of fluid flow with the Navier-Stokes Simulated Flow Dataset for PINNs! This collection of 10,000 simulated data points captures the essence of fluid dynamics in a 2D channel, tailored specifically for training Physics-Informed Neural Networks (PINNs). With an even split of 5,000 laminar flow and 5,000 turbulent flow points, this dataset is perfect for researchers, data… See the full description on the dataset page: https://huggingface.co/datasets/Allanatrix/CFD.
Popular movies dataset
kaggle.com
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajan (2025). Popular movies dataset [Dataset]. https://www.kaggle.com/datasets/rajansavaliya22/popular-movies-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rajan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This file contain metadata for 10,000 movies. The dataset consists of movies released on, or before June 2025. Data points include Movie Title, Tmdb id, original language, Genres, Release date, Revenue, budget, Runtime and Overview of movie.

Content

This dataset consists of following files:

popular_movies.csv: Contains information about movies i.e. title, tmdb id, original_language, genres, release date, revenue, budget, runtime and overview.

credits.csv: This file details about casts in movie and crew members worked on movie.

Acknowledgements

This dataset is an ensemble of data collected from TMDB. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.
C
Geological database, 1:10.000 - Geomorphological and anthropic elements...
ckan.mobidatalab.eu
wfs, wms, zip
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoDatiGovIt RNDT (2023). Geological database, 1:10.000 - Geomorphological and anthropic elements (points) - 10k [Dataset]. https://ckan.mobidatalab.eu/dataset/geological-database-1-10-000-geomorphological-and-anthropic-elements-10k-points
Explore at:
wfs, wms, zipAvailable download formats
Dataset updated
Apr 27, 2023
Dataset provided by
GeoDatiGovIt RNDT
Description
Georeferenced vector type database, containing the geomorphological and anthropic elements in punctual form of the mountain regional territory, surveyed at the acquisition scale 1:10,000. The geographical area covered includes the regional Apennine territory.
h
MedSynth
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad Rezaie (2025). MedSynth [Dataset]. https://huggingface.co/datasets/Ahmad0067/MedSynth
Explore at:
Dataset updated
Jul 15, 2025
Authors
Ahmad Rezaie
Description
Dataset Card for MedSynth

The MedSynth dataset contains synthetic medical dialogue–note pairs developed for the medical dialogue-to-note summarization task.

Dataset Details Dataset Description

The dataset covers 2000 ICD-10 codes, with five data points per code, resulting in a total of more than 10,000 data points. The notes are in SOAP format.

Uses

MedSynth should not be used as a reliable source of medical information. It is intended solely to… See the full description on the dataset page: https://huggingface.co/datasets/Ahmad0067/MedSynth.
D
Damped pendulum for nonlinear system identification - inputs are sampled...
darus.uni-stuttgart.de
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Frank (2025). Damped pendulum for nonlinear system identification - inputs are sampled from a multivariate-normal distribution - synthetically generated [Dataset]. http://doi.org/10.18419/DARUS-4770
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4770
Dataset updated
Feb 26, 2025
Dataset provided by
DaRUS
Authors
Daniel Frank
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
DFG
Description
Overview This dataset contains input-output data of a damped nonlinear pendulum that is actuated at the mounting point. The data was generated with statesim [1], a python package for simulating linear and nonlinear ODEs, for the system actuated pendulum. The configuration .json files for the corresponding datasets (in-distribution and out-of-distribution) can be found in the respective folders. After creating the dataset, the files are stored in the raw folder. Then, they are split into subsets for training, testing, and validation and can be found in the processed folder; details about the splitting are found in the config.json file. The dataset can be used to test system identification algorithms and methods that aim to identify nonlinear dynamics from input-output measurements. The training dataset is used to optimize the model parameters, the validation set for hyperparameter optimization, and the test set only for the final evaluation. In [2], the authors used the same underlying dynamics to create their dataset but without damping terms. Input generation Input trajectories are sampled from a multivariate-normal distribution. Noise Gaussian white noise of approximately 30dB is added at the output. Statistics The input and output size is one. In-distribution data: 2 100 000 data points Training: 10 000 trajectories of length 150 Validation: 2 000 trajectories of length 150 Test: 2 000 trajectories of length 150 Out-of-distribution data: 7 times 100 000 data points 7 different datasets were only used for testing. Each dataset contains 200 trajectories of length 500. References Frank, D. statesim [Computer software]. https://github.com/Dany-L/statesim Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3), 218-229.
Gravity Survey (P198089), gravity point data
ecat.ga.gov.au
researchdata.edu.au
Updated Jul 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2021). Gravity Survey (P198089), gravity point data [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/e6837759-09d9-4a13-bfb1-23ab50916e33
Explore at:
www:link-1.0-http--link, www:link-1.0-http--opendapAvailable download formats
Dataset updated
Jul 5, 2021
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Time period covered
Jan 1, 1980 - Dec 31, 1980
Area covered

Description
Gravity data measures small changes in gravity due to changes in the density of rocks beneath the Earth's surface. The data collected are processed via standard methods to ensure the response recorded is that due only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose. This Gravity Survey (P198089) contains a total of 461 point data values acquired at a spacing between 450 and 10000 metres. The data is located in SA and were acquired in 1980, under project No. 198089 for None.
d
Crypto 30 Minute Price Data (VWAP)| 10,000 Cryptocurrency Tickers | +65 DEX...
datarade.ai
.json
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blocksize (2024). Crypto 30 Minute Price Data (VWAP)| 10,000 Cryptocurrency Tickers | +65 DEX & CEX | No Rate Limits [Dataset]. https://datarade.ai/data-products/crypto-30-minute-price-data-vwap-10-000-cryptocurrency-tic-blocksize
Explore at:
.jsonAvailable download formats
Dataset updated
Apr 13, 2024
Dataset authored and provided by
Blocksize
Area covered
Thailand, Guadeloupe, Iran (Islamic Republic of), Wallis and Futuna, Greece, Italy, Senegal, Saint Martin (French part), Malaysia, British Indian Ocean Territory
Description
Access our data for free: https://matrix.blocksize.capital/auth/open/sign-up

The Blocksize 30-Minute VWAP Feed provides precise, time-anchored pricing snapshots for digital assets, updated every 30 minutes around the clock. Designed for use cases where regular and unbiased price reference points are essential — such as portfolio valuation, fund NAV calculation, settlement, or compliance reporting — this feed offers volume-weighted average prices based on executed trades across a broad and continuously vetted set of exchanges.

Each pricing point is calculated using trade data observed during the 30-minute interval immediately preceding each half-hour mark (e.g., 00:30, 01:00, 01:30 UTC, etc.). For each interval, the final price is derived from the volume-weighted average of the last trade events on all reporting exchanges. This method ensures that higher-volume trades contribute more significantly to the resulting price, offering a fair and liquidity-sensitive reflection of market value.

To ensure accuracy and data integrity, only validated trade events with complete volume, price, and timestamp information are considered. Any incomplete, malformed, or delayed exchange data is automatically excluded from the calculation. In the rare event that no valid data is available for a given interval, the feed defaults to the last available valid price to preserve pricing continuity — a critical feature for settlement systems and automated pipelines.

The feed also benefits from active oversight and quality assurance by Blocksize’s internal data committee. Exchanges that show recurring anomalies or inconsistencies are removed from the input set until verified corrections are made, while new sources are added only after rigorous integrity checks. This combination of automation, governance, and data hygiene ensures that the 30-minute VWAP feed remains a trusted pricing oracle for digital asset markets, even during volatile or low-liquidity periods.

Our Customers:

Oracles & DeFi Protocols and Applications

Asset & Fund Managers investing in digital assets

Asset Custodians storing digital assets

Banks, Brokers with crypto offering

Traditional Data Providers planning to extend their offering to digital assets

Information Provider platforms

Questions? Reach out to our qualified data team.

PII Statement: Our datasets does not include personal, pseudonymized, or sensitive user data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset

College Student Placement Factors Dataset

College Student Placement Dataset (10,000 Rows | Realistic Simulation)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sahil Islam007

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📘 College Student Placement Dataset

A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

📄 Dataset Description

The dataset is ideal for:

Predictive modeling of placement outcomes
Educational exercises in classification
Feature importance analysis
End-to-end machine learning projects

📊 Columns Description

Column Name	Description
College_ID	Unique ID of the college (e.g., CLG0001 to CLG0100)
IQ	Student’s IQ score (normally distributed around 100)
Prev_Sem_Result	GPA from the previous semester (range: 5.0 to 10.0)
CGPA	Cumulative Grade Point Average (range: ~5.0 to 10.0)
Academic_Performance	Annual academic rating (scale: 1 to 10)
Internship_Experience	Whether the student has completed any internship (Yes/No)
Extra_Curricular_Score	Involvement in extracurriculars (score from 0 to 10)
Communication_Skills	Soft skill rating (scale: 1 to 10)
Projects_Completed	Number of academic/technical projects completed (0 to 5)
Placement	Final placement result (Yes = Placed, No = Not Placed)

🎯 Target Variable

Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

🧠 Use Cases

📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
🔍 Exploratory Data Analysis (EDA)
🎯 Feature Engineering and Selection
🧪 Model Evaluation Practice
👩‍🏫 Academic Projects & Capstone Use

📦 Dataset Size

Rows: 10,000
Columns: 10
File Format: .csv

📚 Context

📜 License

MIT

🔗 Source

Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

Clear search

Close search

Google apps

Main menu

College Student Placement Factors Dataset

📘 College Student Placement Dataset

📄 Dataset Description

📊 Columns Description

🎯 Target Variable

🧠 Use Cases

📦 Dataset Size

📚 Context

📜 License

🔗 Source

Vector CTRN 1:10,000 (1991-2005) — Listed points

Predictive Maintenance Dataset (AI4I 2020)

DS926 Digital surfaces and thicknesses of selected hydrogeologic units of...

Texas Gravity Data (P199841), gravity point data

Data from: We Just Ran Twenty-Three Million Queries of the World Bank's Web...

Car Prices Market

ABOUT CAR PRICES MARKET

Content:

Used Cars Prices

New Cars Prices

You can use this data to practice:

Inspiration About Dataset:

Question To Answer:

Hydrogeological map 1:10,000 (points)

deep1b

Windmill Islands 1:10000 Profiles and Transects GIS Dataset

Predictive Maintenance - Dataset - Asset Explorer

sift1m

Training dataset for the TRENDY method

CFD

Popular movies dataset

Content

Acknowledgements

Geological database, 1:10.000 - Geomorphological and anthropic elements...

MedSynth

Damped pendulum for nonlinear system identification - inputs are sampled...

Gravity Survey (P198089), gravity point data

Crypto 30 Minute Price Data (VWAP)| 10,000 Cryptocurrency Tickers | +65 DEX...

College Student Placement Factors Dataset

College Student Placement Dataset (10,000 Rows | Realistic Simulation)

📘 College Student Placement Dataset

📄 Dataset Description

📊 Columns Description

🎯 Target Variable

🧠 Use Cases

📦 Dataset Size

📚 Context

📜 License

🔗 Source