100+ datasets found

student data analysis
kaggle.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maira javeed (2023). student data analysis [Dataset]. https://www.kaggle.com/datasets/mairajaveed/student-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
maira javeed
Description
In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.

**********Key Objectives:*********

Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.

Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.

Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.

Dataset Details:

The dataset used in this analysis contains information about students, including their age, gender, parental education, lunch type, and test scores in subjects like mathematics, reading, and writing.

Analysis Highlights:

We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.

By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.

Why This Matters:

Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.

Acknowledgments:

We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.

Please Note:

This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.
b
Stock Prices Dataset
brightdata.com
.json, .csv, .xlsx
Updated Dec 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Stock Prices Dataset [Dataset]. https://brightdata.com/products/datasets/financial/stock-price
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 2, 2024
Dataset authored and provided by
Bright Data
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Use our Stock prices dataset to access comprehensive financial and corporate data, including company profiles, stock prices, market capitalization, revenue, and key performance metrics. This dataset is tailored for financial analysts, investors, and researchers to analyze market trends and evaluate company performance.

Popular use cases include investment research, competitor benchmarking, and trend forecasting. Leverage this dataset to make informed financial decisions, identify growth opportunities, and gain a deeper understanding of the business landscape. The dataset includes all major data points: company name, company ID, summary, stock ticker, earnings date, closing price, previous close, opening price, and much more.
d
Streamflow-gain- and streamflow-loss data for streamgages in the Central...
catalog.data.gov
data.usgs.gov
+3more
Updated Oct 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Streamflow-gain- and streamflow-loss data for streamgages in the Central Valley Hydrologic Model [Dataset]. https://catalog.data.gov/dataset/streamflow-gain-and-streamflow-loss-data-for-streamgages-in-the-central-valley-hydrologic-
Explore at:
Dataset updated
Oct 5, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Central Valley
Description
This digital dataset contains 61 sets of annual streamflow gains and losses between 1961 and 1977 along Central Valley surface-water network for the Central Valley Hydrologic Model (CVHM). The Central Valley encompasses an approximate 50,000 square-kilometer region of California. The complex hydrologic system of the Central Valley is simulated using the USGS's numerical modeling code MODFLOW-FMP (Schmid and others, 2006). This simulation is referred to here as the CVHM (Faunt, 2009). Utilizing MODFLOW-FMP, the CVHM simulates groundwater and surface-water flow, irrigated agriculture, land subsidence, and other key processes in the Central Valley on a monthly basis from 1961-2003. The total active modeled area is 20,334 square-miles. The CVHM includes complex surface-water management processes. The hydrology of the present-day Central Valley and the CVHM model are driven by surface-water deliveries and associated groundwater pumpage. The Streamflow Routing Package (SFR1) is linked to MODFLOW-FMP to facilitate the simulated conveyance of surface-water deliveries. If surface-water deliveries do not meet the farm-delivery requirement, the FMP invokes simulated groundwater pumping to meet the demand. The surface-water network represents a subset of the entire stream network in the valley. Quantitative observations of streamflow gains and losses were available for 57 reaches of 20 major stream systems in the Central Valley for water years 1961-77 (Mullen and Nady, 1985). These observations were included in parameter estimation process and in the model-fit statistics. The CVHM is the most recent regional-scale model of the Central Valley developed by the U.S. Geological Survey (USGS). The CVHM was developed as part of the USGS Groundwater Resources Program (see "Foreword", Chapter A, page iii, for details).
Z
Dataset: Adult Age Differences in Remembering Gain- and Loss-Related...
data.niaid.nih.gov
Updated Jun 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Freund, Alexandra (2021). Dataset: Adult Age Differences in Remembering Gain- and Loss-Related Intentions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4923320
Explore at:
Dataset updated
Jun 11, 2021
Dataset provided by
Horn, Sebastian
Freund, Alexandra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data used in the analyses of Registered Report Horn, S. & Freund. A. Adult Age Differences in Remembering Gain- and Loss-Related Intentions. Cognition and Emotion.
LinkedIn Datasets
brightdata.com
.json, .csv, .xlsx
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
Netflix movies and tv shows dataset
crawlfeeds.com
csv, zip
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Netflix movies and tv shows dataset [Dataset]. https://crawlfeeds.com/datasets/netflix-movies-and-tv-shows-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Dive into the Netflix Movies and TV Shows Dataset, a detailed collection of web-scraped data featuring popular streaming titles. Discover trending movies, binge-worthy TV series, genres, ratings, release years, and audience preferences. Gain insights into Netflix originals, global streaming trends, and viewer favorites to inform market analysis and entertainment research.

Perfect for exploring content diversity, production trends, and streaming platform dynamics.
NBA WNBA play-by-play and shots data
kaggle.com
zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladislav Shufinskiy (2025). NBA WNBA play-by-play and shots data [Dataset]. https://www.kaggle.com/datasets/brains14482/nba-playbyplay-and-shotdetails-data-19962021
Explore at:
zip(1683596108 bytes)Available download formats
Dataset updated
Jun 26, 2025
Authors
Vladislav Shufinskiy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description

NBA anba WNBA dataset is a large-scale play-by-play and shot-detail dataset covering both NBA and WNBA games, collected from multiple public sources (e.g., official league APIs and stats sites). It provides every in-game event—from period starts, jump balls, fouls, turnovers, rebounds, and field-goal attempts through free throws—along with detailed shot metadata (shot location, distance, result, assisting player, etc.).

Also you can download dataset from github or GoogleDrive

Tutorials

NBA play-by-play dataset R example

I will be grateful for ratings and stars on github, but the best gratitude is use of dataset for your projects.

Useful links:

nba-on-court: package for work with NBA and WNBA play-by-play data

Ryan Davis: Analyze the Play by Play Data

Python nba_api package for work with NBA API: https://github.com/swar/nba_api

R hoopR package for work with NBA API: https://hoopr.sportsdataverse.org/

Motivation

I made this dataset because I want to simplify and speed up work with play-by-play data so that researchers spend their time studying data, not collecting it. Due to the limits on requests on the NBA and WNBA website, and also because you can get play-by-play of only one game per request, collecting this data is a very long process.

Using this dataset, you can reduce the time to get information about one season from a few hours to a couple of seconds and spend more time analyzing data or building models.

I also added play-by-play information from other sources: pbpstats.com, data.nba.com, cdnnba.com. This data will enrich information about the progress of each game and hopefully add opportunities to do interesting things.

Contact Me

If you have any questions or suggestions about the dataset, you can write to me in a convenient channel for you:

LinkedIn

GIthub

X

Telegram
TESLA Inc Last 5 Years Stock Historical Data
kaggle.com
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jims Chacko (2023). TESLA Inc Last 5 Years Stock Historical Data [Dataset]. https://www.kaggle.com/jimschacko/tesla-inc-last-5-years-stock-historical-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jims Chacko
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Explore the fascinating journey of Tesla's stock performance over the past 5 years and gain valuable insights into its growth, trends, and market behavior. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2655536%2F077eab1e897d10453e7fdfe1619a3d05%2FTesla-Logo-PNG-HD-Isolated.png?generation=1687460082377704&alt=media" alt=""> Tesla, the renowned electric vehicle manufacturer, has captured the world's attention with its groundbreaking innovations and exponential growth. In this blog post, we will dive into Tesla's stock performance over the past five years, unraveling key trends and providing valuable insights for investors and enthusiasts alike.

The Dataset holds Tesla Stock Prices from last 5 years.

Date: First Column represents the data.

Open: Tesla Stock Opening Price for the given date.

High: Tesla Stock price highest price point hit.

Low: Tesla Stock price lowest price for the given date Tesla Stock price lowest price for the given date.

adj Close: Adjusted stock closing price of Tesla after taking dividends, stock splits, and new stock offerings into account.

Volume: Amount of an Tesla Stock that changed hands over the course of the trading d

Source: https://finance.yahoo.com
e
Data for: A global-scale dataset of direct natural groundwater recharge...
opendata.eawag.ch
Updated Jun 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Data for: A global-scale dataset of direct natural groundwater recharge rates: A review of variables, processes and relationships - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/globalscale_groundwater_moeck
Explore at:
Dataset updated
Jun 9, 2020
Description
Groundwater recharge indicates the existence of renewable groundwater resources and is therefore an important component in sustainability studies. However, recharge is also one of the least understood, largely because it varies in space and time and is difficult to measure directly. For most studies, only a relatively small number of measurements is available, which hampers a comprehensive understanding of processes driving recharge and the validation of hydrogeological model formulations for small- and large-scale applications. We present a new global recharge dataset encompassing >5000 locations. In order to gain insights into recharge processes, we provide a systematic analysis between the dataset and other global-scale datasets, such as climatic or soil-related parameters. Precipitation rates and seasonality in temperature and precipitation were identified as the most important variables in predicting recharge. The high dependency of recharge on climate indicates its sensitivity to climate change. We also show that vegetation and soil structure have an explanatory power for recharge. Since these conditions can be highly variable, recharge estimates based only on climatic parameters may be misleading. The freely available dataset offers diverse possibilities to study recharge processes from a variety of perspectives. By noting the existing gaps in understanding, we hope to encourage the community to initiate new research into recharge processes and subsequently make recharge data available to improve recharge predictions.
Z
Data_WP5_3_Monetary valuation of impacts and cost-benefit analysis
data.niaid.nih.gov
data.europa.eu
Updated Mar 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Taylor (2021). Data_WP5_3_Monetary valuation of impacts and cost-benefit analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4605801
Explore at:
Dataset updated
Mar 23, 2021
Dataset authored and provided by
Tim Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes the results of the cost-effectiveness and cost-benefit analysis performed for each of the policy/measure options defined in Task 5.1. As such it complements the dataset “Data_WP5_2_Health effects at the community level data set” in providing the assessment of the economic dimension for the same policies/measures. Cost-effectiveness analysis examined the costs of these options and calculated for example the cost per ton of CO2eq. For the cost-benefit analysis the dataset includes all benefits, damages and costs and the non-monetary (intangible) items which were transformed into monetary values (where possible) including social costs, monetized health impacts, monetized contributions to climate change, utility and gain losses. The full dataset is organized in three different files according to the sector addressed by the policy/measure options analyzed. In this light, the file named “CBA active transport” includes the full results for the active transport policies; the file named “CBA alternative fuel vehicles” results for all the alternative fuel vehicles policies; and the file “CBA energy efficiency” for the energy efficiency policies. Every single file includes multiple worksheets which respectively encompasses a summary of all the CBA results for the policy sector addressed, as well as other worksheets including the detailed results for each specific policy up to the year 2040. The data are available either in MS–Excel xls(x) format to ensure full interoperability allowing easy parsing and information exchange.
w
Data from: Dataset to accompany genomics combined with UAS data enhances...
rex.libraries.wsu.edu
csv, gz
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Osval A. Montesinos-López; Andrew W. Herr; Jose Crossa; Arron H. Carter (2022). Dataset to accompany genomics combined with UAS data enhances prediction of grain yield in winter wheat [Dataset]. https://rex.libraries.wsu.edu/esploro/outputs/dataset/Dataset-to-accompany-genomics-combined-with/99900914641301842
Explore at:
gz(67338119 bytes), csv(3968871 bytes)Available download formats
Dataset updated
Dec 14, 2022
Dataset provided by
Washington State University
Authors
Osval A. Montesinos-López; Andrew W. Herr; Jose Crossa; Arron H. Carter
Time period covered
2022
Description
With the human population continuing to increase worldwide, there is pressure to employ novel technologies to increase genetic gain in plant breeding programs that contribute to nutrition and food security. Genomic selection (GS) has the potential to increase genetic gain because it can accelerate the breeding cycle, increase the accuracy of estimated breeding values, and improve selection accuracy. However, with recent advances in high throughput phenotyping in plant breeding programs, the opportunity to integrate genomic and phenotypic data to increase prediction accuracy is present. In this paper, we applied GS to winter wheat data integrating two types of inputs: genomic and phenotypic. We observed the best prediction performance when combining both genomic and phenotypic inputs, while only using genomic information fared poorly. Interestingly, using only phenotypic information was slightly worse in some cases than the combination of both sources, whereas in other cases, using only phenotypic information provided the best prediction performance. Our results are encouraging because it is clear we can enhance the prediction accuracy of GS by integrating more related inputs in the models. Included here are: A .csv file with field trait and drone data from 2018 through 2022 used in model analysis. A .vcf file with genotype by sequencing (gbs) data of all tested wheat lines between 2015 and 2022. This data was also used in model analysis.
College Placement Predictor Dataset
kaggle.com
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SameerProgrammer (2023). College Placement Predictor Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7298157
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7298157
Dataset updated
Dec 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SameerProgrammer
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
1. About the Dataset:

Description: Dive into the world of college placements with this dataset designed to unravel the factors influencing student placement outcomes. The dataset comprises crucial parameters such as IQ scores, CGPA (Cumulative Grade Point Average), and placement status. Aspiring data scientists, researchers, and enthusiasts can leverage this dataset to uncover patterns and insights that contribute to a deeper understanding of successful college placements.

2. Projects Ideas:

Project Idea 1: Predictive Modeling for College Placements Utilize machine learning algorithms to build a predictive model that forecasts a student's likelihood of placement based on their IQ scores and CGPA. Evaluate and compare the effectiveness of different algorithms to enhance prediction accuracy.

Project Idea 2: Feature Importance Analysis Conduct a feature importance analysis to identify the key factors that significantly influence placement outcomes. Gain insights into whether IQ, CGPA, or a combination of both plays a more dominant role in determining success.

Project Idea 3: Clustering Analysis of Placement Trends Apply clustering techniques to group students based on their placement outcomes. Explore whether distinct clusters emerge, shedding light on common characteristics or trends among students who secure placements.

Project Idea 4: Correlation Analysis with External Factors Investigate the correlation between the provided data (IQ, CGPA, placement) and external factors such as internship experience, extracurricular activities, or industry demand. Assess how these external factors may complement or influence placement success.

Project Idea 5: Visualization of Placement Dynamics Over Time Create dynamic visualizations to illustrate how placement trends evolve over time. Analyze trends, patterns, and fluctuations in placement rates to identify potential cyclical or seasonal influences on student placements.

3. Columns Explanation:

IQ:

Definition: Intelligence Quotient, a measure of a person's intellectual abilities.

Data Type: Numeric

Range: Typically, IQ scores range from 70 to 130, with 100 being the average.

CGPA:

Definition: Cumulative Grade Point Average, a measure of a student's overall academic performance.

Data Type: Numeric

Range: Typically, CGPA is on a scale of 0 to 4, with 4 being the highest possible score.

Placement:

Definition: Binary variable indicating whether a student secured a placement (1) or not (0).

Data Type: Categorical (Binary)

Values: 1 (Placement secured) or 0 (No placement).

These columns collectively provide a comprehensive snapshot of a student's intellectual abilities, academic performance, and their success in securing a placement. Analyzing this dataset can offer valuable insights into the dynamics of college placements and inform strategies for optimizing student outcomes.
Fantasy Premier League Player Data (2016-2024)
kaggle.com
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reeve Barreto (2024). Fantasy Premier League Player Data (2016-2024) [Dataset]. https://www.kaggle.com/datasets/reevebarreto/fantasy-premier-league-player-data-2016-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Dataset provided by
Kaggle
Authors
Reeve Barreto
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides an archive of Fantasy Premier League (FPL) player performance data for eight seasons, spanning from 2016-2024.

The data was originally collected from https://github.com/vaastav/Fantasy-Premier-League, a public repository for FPL data.

The dataset has been meticulously cleaned and processed to ensure accuracy and consistency. This may include handling missing values, correcting inconsistencies, and standardizing formats.

The dataset includes a wide range of player statistics captured on a gameweek-by-gameweek basis. This allows you to analyze trends, identify patterns, and gain valuable insights into player performance.

This dataset can be a powerful tool for FPL enthusiasts and data scientists alike. Here are some potential applications: - Trend Analysis: Identify historical trends in player performance across different seasons and positions. - Predictive Modeling: Develop machine learning models to predict player points, performance, and transfers. - Informed Team Selection: Make data-driven decisions to optimize your FPL team for each gameweek. - Comparative Analysis: Compare player statistics across seasons and positions to uncover hidden gems and potential breakout stars.

Using this dataset, you can gain a deeper understanding of FPL player performance and enhance your decision-making for the upcoming season.
4
Multimodal SKEP dataset for attention regulation behaviors, knowledge gain,...
data.4tu.nl
zip
Updated Apr 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoon Lee; Marcus Specht (2023). Multimodal SKEP dataset for attention regulation behaviors, knowledge gain, perceived learning experience, and perceived social presence in e-learning with a conversational agent [Dataset]. http://doi.org/10.4121/4c9de645-ca88-4b45-8fc7-2fc325f191dc.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/4c9de645-ca88-4b45-8fc7-2fc325f191dc.v1
Dataset updated
Apr 21, 2023
Dataset provided by
4TU.ResearchData
Authors
Yoon Lee; Marcus Specht
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reading on digital devices has become more commonplace, often challenging learners' attention. In this study, we hypothesized that allowing learners to reflect on their reading phases with an empathic social robot companion might enhance learners' attention in e-reading. To verify our assumption, we collected a novel SKEP dataset in an e-reading setting with social robot support.

We designed two interfaces: 1) a GUI-based system with a monitor, mouse, and eye tracker implemented, and 2) an HRI-based system, which has a monitor, mouse, eye tracker, and Furhat Robot as physical components. See the footnote to check the specification of the Pupil Core eye tracker and Logitech C505 HD Webcam that was implemented. For both conditions, an informative e-reading material with technicality, "Waste management and critical raw materials," has been provided through a screen-based reader, which we explicitly developed for this study. The content has been chosen, aiming for an equal baseline knowledge for general readers. The text contains 4,750 words, divided into 29 pages covering seven subtopics. The text has been implemented with 47pt on a 27-inch monitor, having 2560*1440 resolution. The setting was optimized for the eye tracker implementation, which requires a bigger font size than the usual PDF readers for high-resolution data collection.

We implemented four measurements that are direct and indirect attentional cues. Data features and granularity varies based on the data collection methods, collection timing, and data post-processing. Learners' self-regulatory behavior has been collected through a video feed and annotated second-by-second by human labelers as post hoc. Labels are observable behavioral cues that indicate learners' attentional shifts. Movements from the 1) eyebrow, 2) blink, 3) mumble, 4) hands, and 5) body works as good predictors of learners' self-awareness on attention loss; we annotated 60 video samples by applying six labels, including 6) neutral state as opposed to five attention regulation behavior labels. Additionally, we examined multimodal cues that are direct and indirect clues of attention: knowledge gain, perceived learning experience, and perceived social presence with interfaces (see readme.txt for descriptions of indicators).
Global Salt Marsh Change, 2000-2019 - Dataset - NASA Open Data Portal
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
Updated Mar 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Global Salt Marsh Change, 2000-2019 - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/global-salt-marsh-change-2000-2019-bc1eb
Explore at:
Dataset updated
Mar 20, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This dataset provides global salt marsh change, including loss and gain for five-year periods from 2000-2019. Loss and gain at a 30 m spatial resolution were estimated with Normalized Difference Vegetation Index (NDVI) anomaly algorithm using Landsat 5, 7, and 8 collections within the known extent of salt marshes. The data are provided in cloud-optimized GeoTIFF format.
m
Fruits Dataset for Classification
data.mendeley.com
Updated Feb 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS GTS (2025). Fruits Dataset for Classification [Dataset]. http://doi.org/10.17632/rg254yr63x.1
Explore at:
Unique identifier
https://doi.org/10.17632/rg254yr63x.1
Dataset updated
Feb 11, 2025
Authors
GTS GTS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images

Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.

Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.

Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.

Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.

Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.

This dataset is sourced from Kaggle.
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Political Tweets Dataset
brightdata.com
.json, .csv, .xlsx
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Political Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets/political
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 23, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Political Tweets dataset to enhance campaign strategies and gain insights into public discourse. This dataset offers a comprehensive view of political dynamics on social media, empowering organizations, researchers, and policymakers to analyze trends and sentiment. Access the full dataset or customize it with specific data points tailored to your needs. Popular use cases include: Sentiment Analysis: Analyze publicly available political tweets to understand public sentiment on policies, events, and candidates, aiding campaign strategies and opinion research. Trend Monitoring: Track trending topics and hashtags in political discourse to identify key issues and shifts in public priorities across demographics. Misinformation Detection: Detect and analyze patterns of misinformation, supporting efforts to combat its spread effectively. Harness these insights to stay informed and adapt to the evolving political landscape.
N
McKenzie County, ND annual median income by work experience and sex dataset:...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). McKenzie County, ND annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/mckenzie-county-nd-income-by-gender/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
McKenzie County, North Dakota
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in McKenzie County. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In McKenzie County, the median income for all workers aged 15 years and older, regardless of work hours, was $70,683 for males and $45,098 for females.
These income figures highlight a substantial gender-based income gap in McKenzie County. Women, regardless of work hours, earn 64 cents for each dollar earned by men. This significant gender pay gap, approximately 36%, underscores concerning gender-based income inequality in the county of McKenzie County.
- Full-time workers, aged 15 years and older: In McKenzie County, among full-time, year-round workers aged 15 years and older, males earned a median income of $82,314, while females earned $52,974, leading to a 36% gender pay gap among full-time workers. This illustrates that women earn 64 cents for each dollar earned by men in full-time roles. This level of income gap emphasizes the urgency to address and rectify this ongoing disparity, where women, despite working full-time, face a more significant wage discrepancy compared to men in the same employment roles.
Remarkably, across all roles, including non-full-time employment, women displayed a similar gender pay gap percentage. This indicates a consistent gender pay gap scenario across various employment types in McKenzie County, showcasing a consistent income pattern irrespective of employment status.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for McKenzie County median household income by race. You can refer the same here
n
JPL GRACE and GRACE-FO Mascon Ocean, Ice, and Hydrology Equivalent Water...
podaac.jpl.nasa.gov
html
Updated Sep 15, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PO.DAAC (2015). JPL GRACE and GRACE-FO Mascon Ocean, Ice, and Hydrology Equivalent Water Height Coastal Resolution Improvement (CRI) Filtered Release 06 Version 02 [Dataset]. http://doi.org/10.5067/TEMSC-3JC62
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.5067/TEMSC-3JC62
Dataset updated
Sep 15, 2015
Dataset provided by
PO.DAAC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 4, 2002 - Present
Variables measured
GRAVITY ANOMALIES, SEA LEVEL, SEA LEVEL RISE
Description
This dataset contains gridded monthly global water storage/height anomalies relative to a time-mean, derived from GRACE and GRACE-FO and processed at JPL using the Mascon approach (Version2/RL06). These data are provided in a single data file in netCDF format, and can be used for analysis for ocean, ice, and hydrology phenomena. This version of the data employs a Coastal Resolution Improvement (CRI) filter that reduces signal leakage errors across coastlines. The water storage/height anomalies are given in equivalent water thickness units (cm). The solution provided here is derived from solving for monthly gravity field variations in terms of geolocated spherical cap mass concentration functions, rather than global spherical harmonic coefficients. Additionally, realistic geophysical information is introduced during the solution inversion to intrinsically remove correlated error. Thus, these Mascon grids do not need to be destriped or smoothed, like traditional spherical harmonic gravity solutions. The complete Mascon solution consists of 4,551 relatively independent estimates of surface mass change that have been derived using an equal-area 3-degree grid of individual mascons. A subset of these individual mascons span coastlines, and contain mixed land and ocean mass change signals. In a post-processing step, the CRI filter is applied to those mixed land/ocean Mascons to separate land and ocean mass. The land mask used to perform this separation is provided in the same directory as this dataset. Since the individual mascons act as an inherent smoother on the gravity field, a set of optional gain factors (for continental hydrology applications) that can be applied to the solution to study mass change signals at sub-mascon resolution is also provided within the same data directory as the Mascon data. Please refer to the 'Data Access' tab at the top of this page to gain direct access to the Mascon data. For more information, please visit https://grace.jpl.nasa.gov/data/get-data/jpl_global_mascons/. For a detailed description on the Mascon solution, including the mathematical derivation, implementation of geophysical constraints, and solution validation, please see Watkins et al., 2015, doi: 10.1002/2014JB011547. For a detailed description of the CRI filter implementation, please see Wiese et al., 2016, doi:10.1002/2016WR019344.

Facebook

Twitter

Click to copy link

Link copied

Cite

maira javeed (2023). student data analysis [Dataset]. https://www.kaggle.com/datasets/mairajaveed/student-data-analysis

student data analysis

Student Performance Analysis

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 17, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

maira javeed

Description

In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.

**********Key Objectives:*********

Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.
Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.
Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.

Dataset Details:

The dataset used in this analysis contains information about students, including their age, gender, parental education, lunch type, and test scores in subjects like mathematics, reading, and writing.

Analysis Highlights:

We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.
By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.

Why This Matters:

Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.

Acknowledgments:

We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.

Please Note:

This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.

Clear search

Close search

Google apps

Main menu

student data analysis

Stock Prices Dataset

Streamflow-gain- and streamflow-loss data for streamgages in the Central...

Dataset: Adult Age Differences in Remembering Gain- and Loss-Related...

LinkedIn Datasets

Netflix movies and tv shows dataset

NBA WNBA play-by-play and shots data

Description

Motivation

Contact Me

TESLA Inc Last 5 Years Stock Historical Data

Data for: A global-scale dataset of direct natural groundwater recharge...

Data_WP5_3_Monetary valuation of impacts and cost-benefit analysis

Data from: Dataset to accompany genomics combined with UAS data enhances...

College Placement Predictor Dataset

1. About the Dataset:

2. Projects Ideas:

3. Columns Explanation:

Fantasy Premier League Player Data (2016-2024)

Multimodal SKEP dataset for attention regulation behaviors, knowledge gain,...

Global Salt Marsh Change, 2000-2019 - Dataset - NASA Open Data Portal

Fruits Dataset for Classification

Orange dataset table

Political Tweets Dataset

McKenzie County, ND annual median income by work experience and sex dataset:...

About this dataset

Content

Inspiration

Recommended for further research

JPL GRACE and GRACE-FO Mascon Ocean, Ice, and Hydrology Equivalent Water...

student data analysis

Student Performance Analysis