30 datasets found
  1. Student Performance & Behavior Dataset

    • kaggle.com
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Elhemaly (2025). Student Performance & Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/mahmoudelhemaly/students-grading-dataset
    Explore at:
    zip(1020509 bytes)Available download formats
    Dataset updated
    May 28, 2025
    Authors
    Mahmoud Elhemaly
    Description

    Student Performance & Behavior Dataset

    This dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.

    Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.

    The Attendance is not part of the Total_Score or has very minimal weight.

    Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects

    ComponentWeight (%)
    Midterm15%
    Final25%
    Assignments Avg15%
    Quizzes Avg10%
    Participation5%
    Projects Score30%
    Total100%

    Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).

    Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset

  2. a

    OC Waze Data Map

    • hub.arcgis.com
    Updated Apr 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OC Public Works (2024). OC Waze Data Map [Dataset]. https://hub.arcgis.com/maps/OCPW::oc-waze-data-map/about
    Explore at:
    Dataset updated
    Apr 25, 2024
    Dataset authored and provided by
    OC Public Works
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Area covered
    Description
    1. OC Waze Partner Hub GeoRSS Cumulative Alert Data from Velocity feed analytics. The data are updated in regular (5-minute) intervals. OC Waze Partner Hub data provide information about traffic jams and events that affect road conditions, either from drivers using Waze, a.k.a. Wazers, or from external sources. Wazers may issue reports from the location at which they are currently located or, if no longer at the location, within 30 minutes after the event occurred. We are also able to provide automatic alerts for what we call Unusual Traffic (or Irregularities) - incidents that affect a large number of users and fall outside the normal traffic patterns for a given day and time.Waze generates traffic jam information by processing the following data sources:Data: includes all traffic data reported by Waze users through the Waze mobile application. Reliability: Each alert gets a reliability score based on other user reactions (‘Thumbs up’, ‘Not there’ etc.) and the level of the reporter (Wazers gain levels by contributing to the map, starting at level 1 and reaching up to level 6. The higher the level, the more experienced and trustworthy the Wazer.) The score (0-10) indicates how reliable the report is. Confidence: Each alert gets a confidence score based on other user reactions (‘Thumbs up’, ‘Not there’). The score ranges between 0 and 10. A higher score indicates more positive feedback from Waze users.2. OC Waze Partner Hub GeoRSS Cumulative Traffic Jam Data from Velocity feed analytics. The data are updated in regular (5-minute) intervals.The traffic jams feed includes data gathered in real time about traffic slowdowns on specific road segments. Waze generates traffic jam information by processing the following data sources: GPS location-points sent from user phones (users who drive while using the app) and calculations of the current average speed vs. free-flow speed (maximum speed measured on the road-segment). For Unusual traffic (irregularities) Waze uses historic average speeds (on 30 minute time-slots). User generated reports - reports shared by Waze users who encounter traffic jams. These appear as regular alerts, and also affect the way we identify and present traffic jams.Original data provided by Waze App. Learn more at Waze.com.
  3. d

    2006 Commercial Vehicle Survey: origin and destination

    • datasets.ai
    • data.ontario.ca
    • +2more
    21, 8
    Updated Nov 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Ontario | Gouvernement de l'Ontario (2020). 2006 Commercial Vehicle Survey: origin and destination [Dataset]. https://datasets.ai/datasets/cc38eb0b-c4fa-4bcb-8551-67c06e6b9500
    Explore at:
    21, 8Available download formats
    Dataset updated
    Nov 27, 2020
    Dataset authored and provided by
    Government of Ontario | Gouvernement de l'Ontario
    Description

    General guidelines 1. The dataset contains trip origin, destination, commodity group, average daily trips, commodity weight and value. 2. The data represents activity by medium and heavy trucks only. 3. The origin and destination data is aggregated by counties in Ontario and province or state outside of Ontario. 4. The commodities are grouped into 32 groups and empty trucks. 5. The Commercial Vehicle Survey targets travel on provincial facilities. Therefore, coverage of intra-urban trips is incomplete, and should not be interpreted as representative. 6. Trip activity within the Greater Toronto Area municipalities is not representative. 7. The average trip distance is 440 km. Caution must be exercised with short distance trip activities. 8. All Origin-destination pairs with average trip activity of less than one trip per day have been suppressed. ## Field descriptions Origin zone : Trip Origin Zone Number - Zone aggregation is counties in Ontario and province/state for others 35XX - Ontario counties 70XX - U.S. States XX00-XX00 (except 3500) - Canadian Provinces Origin name : Trip Origin Name - county or province/state name Destination zone : Trip Destination Zone Number - Zone aggregation is counties in Ontario and province/state for others. Same numbering system as origins Destination name : Trip Destination Name - county or province/state name Commodity group code : Unique commodity group numeric code Commodity group: Descriptive name of the commodity group Daily trips : Average daily truck trips Commodity weight : Average daily commodity weight in kilograms (KG) Commodity value : Average daily value of the commodity in dollars ($) *[km]: kilometre

  4. w

    Fire statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2025). Fire statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire-statistics-data-tables
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    GOV.UK
    Authors
    Ministry of Housing, Communities and Local Government
    Description

    On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.

    This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.

    MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/">Northern Ireland: Fire and Rescue Statistics.

    If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Fire statistics guidance
    Fire statistics incident level datasets

    Incidents attended

    https://assets.publishing.service.gov.uk/media/68f0f810e8e4040c38a3cf96/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 143 KB) Previous FIRE0101 tables

    https://assets.publishing.service.gov.uk/media/68f0ffd528f6872f1663ef77/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.12 MB) Previous FIRE0102 tables

    https://assets.publishing.service.gov.uk/media/68f20a3e06e6515f7914c71c/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 197 KB) Previous FIRE0103 tables

    https://assets.publishing.service.gov.uk/media/68f20a552f0fc56403a3cfef/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 443 KB) Previous FIRE0104 tables

    Dwelling fires attended

    https://assets.publishing.service.gov.uk/media/68f100492f0fc56403a3cf94/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, 192 KB) Previous FIRE0201 tables

    <span class="gem

  5. 2022 Bikeshare Data -Reduced File Size -All Months

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendall Marie (2023). 2022 Bikeshare Data -Reduced File Size -All Months [Dataset]. https://www.kaggle.com/datasets/kendallmarie/2022-bikeshare-data-all-months-combined
    Explore at:
    zip(98884 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    Kendall Marie
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).

    I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.

    Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.

    Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.

    9 Columns:

    date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.

    Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.

    Revisions

    Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:

    • Deleted station location columns and lat/long as much of this data was already missing.

    • Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.

    • Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.

    • Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.

    • Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.

    • Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.

    • Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.

    • Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)

  6. ERA5 post-processed daily statistics on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Dec 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
    Explore at:
    gribAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

    The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

    *The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.

  7. u

    2006 Commercial Vehicle Survey: origin and destination - Catalogue -...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). 2006 Commercial Vehicle Survey: origin and destination - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-cc38eb0b-c4fa-4bcb-8551-67c06e6b9500
    Explore at:
    Dataset updated
    Oct 19, 2025
    Area covered
    Canada
    Description

    General guidelines 1. The dataset contains trip origin, destination, commodity group, average daily trips, commodity weight and value. 2. The data represents activity by medium and heavy trucks only. 3. The origin and destination data is aggregated by counties in Ontario and province or state outside of Ontario. 4. The commodities are grouped into 32 groups and empty trucks. 5. The Commercial Vehicle Survey targets travel on provincial facilities. Therefore, coverage of intra-urban trips is incomplete, and should not be interpreted as representative. 6. Trip activity within the Greater Toronto Area municipalities is not representative. 7. The average trip distance is 440 km. Caution must be exercised with short distance trip activities. 8. All Origin-destination pairs with average trip activity of less than one trip per day have been suppressed. ## Field descriptions Origin zone : Trip Origin Zone Number - Zone aggregation is counties in Ontario and province/state for others 35XX - Ontario counties 70XX - U.S. States XX00-XX00 (except 3500) - Canadian Provinces Origin name : Trip Origin Name - county or province/state name Destination zone : Trip Destination Zone Number - Zone aggregation is counties in Ontario and province/state for others. Same numbering system as origins Destination name : Trip Destination Name - county or province/state name Commodity group code : Unique commodity group numeric code Commodity group: Descriptive name of the commodity group Daily trips : Average daily truck trips Commodity weight : Average daily commodity weight in kilograms (KG) Commodity value : Average daily value of the commodity in dollars ($) *[km]: kilometre

  8. Z

    Data from: Data for "Why Bananas Look Yellow: The Dominant Hue of Object...

    • data.niaid.nih.gov
    • eprints.soton.ac.uk
    • +1more
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Witzel; Haden Dewis (2024). Data for "Why Bananas Look Yellow: The Dominant Hue of Object Colours" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5164859
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    University of Southampton
    Authors
    Christoph Witzel; Haden Dewis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These extended supplementary materials go with the article:

    Witzel & Dewis (2022) Why Bananas Look Yellow: The Dominant Hue of Object Colours. Vision Research.

    A. SURVEYS

    A pdf-printout for each of the three Qualtrics surveys illustrates details of the procedure. The layout may have been slightly different in Qualtrics (e.g., wide screen vs portrait display). Also note that the second and third surveys feature a few questions that were unrelated to the dominant-hue study (identifying a grey image).

    B. STIMULI

    The images used in Experiments 1-3, and the animated images used as cues to colour changes in Experiment 3 are packed in zip-files.

    C. CODE

    The Matlab code "onehue_maker.m" is a function that implements the dominant-hue algorithm to produce one-hue images like those in the experiments. To try out the program, the photo of the banana and the mask identifying its background are also uploaded (= first and second input to the function). The purpose of the mask is to remove the background colour from the dominant-hue computations.

    D. DATA

    The uploaded data is not completely raw but has been polished in the following ways:

    Pilot data has been removed (i.e., meaningless data from us and our students to try out, check and polish the survey).

    Incomplete runs have been removed (i.e., when participants quitted before completing the whole survey).

    Data irrelevant to this study have been removed (date and time; grey-identification task [see above]).

    There are 3 sheets with data and three sheets with stimulus specifications for each of the three experiments. The stimulus specifications include the measures used in the analyses in "Other Factors" in the Discussion of Experiment 3.

    Columns in the Data sheets are:

    Participant information: recruit (soc med = social media; UG pool = undergraduate students, prolific = https://www.prolific.co/); coldef = Colour deficiencies (1 Yes, 2 No according to test, 3 No without test, 4 Don't know); sex (1 male, 2 female, 3 other); age (in years), and duration (in minutes).

    Main data: Column labels are composed of the following elements, separated by an underscore (_):

    The first 3-5 letters of the object name: ban = banana, car = carrot, cher = cherry, dress = #theDress, fro = frog, gra = grapes, lem = lemon, let = lettuce, ora = orange, pig, ros = rose, shoe = #theShoe, stra = strawberry, zuc = zucchini/courgette.

    A symbol indicating the stimulus condition: 1 = One-Hue, m = Minus-Hue Rotation, p = Plus-Hue Rotation.

    A number identifying the measure: 1 = responded position; 2 = accuracy of the response (1 = correct); 3 = response time (in sec), 4 (Experiment 2-3) = confidence rating (between 0 and 100), 5 (Experiment 3) = cue confidence (cf. Figure 11.a).

    For inverted colours (Experiment 3), the column label starts with an "i" (for inverted).

    Practice Trials: Start with the prefix ex (for example) followed by an underscore (_) and the ID of the object; otherwise, data as in main trials.

    Catch Trials (Experiment 2-3): Start with object name "d" for disk, otherwise, data as in main trials.

    Eidolon Guesses (Experiment 2): Start with "guess" followed by the object ID (see main trials) followed by a number indicating the measure: 1 = response (yes/no), 2 = confidence (if positive response). In case of a positive response, the text entries are save in the variables starting with guess_txt.

    Columns in the stimulus sheets are:

    DomHue: Angle of the dominant hue (cf. Figure 3); as principal components are relative to the average, the angle is relative to the average, not the origin.

    pole1 and pole2: Poles of the dominant hue direction. "pole1_rgb" provides corresponding RGBs for illustration (cf. Figure 1).

    ChromaRescaled: Rescale Factor (see Experiment 3).

    MaxChr: Maximum chroma of the colour distribution in CIELUV.

    M: Average chromaticities (u*, v*) of the colour distribution.

    pc: Coefficients of the first principal component for u* and v*.

    latent & expl: Absolute and relative explained variance, respectively; second column corresponds to orthogonal variance.

    hueM & hueSD: Average and standard deviation of the hue of the colour distribution (cf. Figure 3).

    rot_minus, rot_plus: The hue rotations in the rotated-hue condition (constant minus or plus 5, except for #theShoe).

    oog_1hue, oog_plus, oog_minus: The proportion of out-of-gamut values.

    oogdist_1hue, oogdist_minus, oogdist_plut: Average difference between clipped and original images (in CIELUV).

    Mshift_1hue, Mshift_minus, Mshift_plus: Average and standard deviation of chromaticity shift due to the experimental manipulation (cf. Figure 5 and Table S1).

    Mhueshift_1hue, Mhueshift_minus, Mhueshift_plus: Average and standard deviation of hue shift in CIELUV (cf. Figure S4.d-f and Table S2).

    Lab_shift_1hue, Lab_shift_minus, Lab_shift_plus: Average and standard deviation of chromaticity shift in CIELAB (cf. Figure S4.a-c and Table S1).

    Lab_hueshift_1hue, Lab_hueshift_minus, Lab_hueshift_plus: Average and standard deviation of hue shift in CIELAB (cf. Figure S4.g-i and Table S2).

    Lab_Mhue: Hue of the average colour in CIELAB

    Lab_hueM & Lab_hueSD0: Average and standard deviation of the CIELAB hue distribution.

    huehist0: CIELUV hue histogram; each entry corresponds to the frequencies for 72 bins of 5-deg (cf. Figure 3); the zero indicates that the hue is relative to the origin, not to the average chromaticity.

  9. ⚾ Major League Baseball Hitting ⚾

    • kaggle.com
    zip
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane Simon (2023). ⚾ Major League Baseball Hitting ⚾ [Dataset]. https://www.kaggle.com/datasets/m000sey/major-league-baseball-hitting-data
    Explore at:
    zip(99765 bytes)Available download formats
    Dataset updated
    Oct 14, 2023
    Authors
    Shane Simon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Who doesn't love over-analyzing baseball data? Which hitters performed the best? What's the distribution of 'OBP' this year? What hitters over-performed relative to their StatCast underlying data? Let's dig in.

    This year, the MLB instituted a bunch of interesting baseball rules, including bigger bases, a pitch clock, and limited shifts, among others. This undoubtedly changed the offensive environment...

    I got the raw data from www.fangraphs.com and then cleaned it up for everyone to analyze. Happy EDA and let me know if you find any cool trends. Note: I only listed qualified hitters with at least 100 plate appearances.

    I want to add position and sidedness to the datasheet for each hitter. Stay tuned

    Feature descriptions: Name - hitter's name Team - hitter's team (or last team they were on) G = games played AB = # of at bats PA = plate appearances H = hits 1B = singles 2B = doubles 3B = triples HR = home runs R = runs scored RBI = runs batted in BB = bases on balls IBB = intentional bases on balls SO = strike outs HBP = hit by pitch SF = sacrifice fly SH = sacrifice hit GDP =ground into a double play SB = stolen base CS = caught stealing AVG = batting average BB% = BB / PA K% = SO/PA BB/K = BB/SO OBP = on base percentage SLG = slugging percentage OPS = OBP + SLG ISO = SLG - AVG Spd = running speed score BABIP = AVG on balls in play UBR = ultimate base running in runs above average wGDP = weighted ground into double play runs above average wSB = SB and CS runs above average wRC = weighted runs created based on wOBA wRAA = weighted runs above averaged based on wOBA wOBA = weight on base percentage average wRC+ = rwRC plus, whereby additional factors are taken into consideration like ball park or era GB/FB = ground ball to fly ball ratio LD% = line drive % (LB / balls in play) GB% = ground ball % (GB/ balls in play) Flyball% = flyball% also commonly known as FB% (Flyball/ balls in play) IFFB% = infield flyball % (in field flyball / flyballs) HR/FB = home run / Flyball IFH = infield hits IFH% = IFH / GB BUH = bunt hits BUH% = BUH / bunts Pull% = % of balls that were pulled by hitter Oppo% = % of balls that were pushed by hitter Cent% = % of balls that were hit to CF by hitter Soft% = % of balls hit in play that were classified as hit with soft speed Med% = % of balls hit in play that were classified as hit with medium speed Hard% = % of balls hit in play that were classified as hit with hard speed Batted ball = PA - SO - BB - HBP EV = average exit velocity of Batted ball maxEV = maximum exit velocity of Batted ball LA = Launch angle Barrels = a batted ball with an exit velocity of at least 98mph and LA between 26-30 degrees. For EV mph over 98 degrees, the LA range gets higher by 1 degree Barrel% = % of Batted balls that are classified as barrels HardHit = # of Batted balls with an EV of 95 or higher HardHit% = % of Battled balls with an EV of 95 or higher xBA = expected batting average xSLG = expected slugging percentage xwOBA = expected weighted on base average Clutch = (Win Probability Added / a hitter's Leverage index for all game events) - (Win Probability Added / Leverage index), which essentially measures how much better a player does in a high leverage situation compared to a neutral situation. O-Swing% = % of pitches a batter swings at outside of the strike zone Z-Swing% = % of pitches a batter swings at inside of the strike zone Swing% = % of total pitches a batter swings at O-Contact% = % of times a batter makes contact with the ball when swinging at pitches thrown outside of the zone Z-Contact% = % of times a batter makes contact with the ball when swining at pitches thrown inside of the zone Contact% = total percentage of contact made when swinging at all pitches Zone% = % of pitches seen inside the strike zone F-Strike% = First pitch strike percentage SwStr% = Swinging strike % CStr% = Called strike % CSW% = SwStr% + CStr% wFB = How well does the batter do vs fastballs? Using pitch types linear weights wSL = How well does the batter do vs sliders? Using pitch types linear weights wCT = How well does the batter do vs cutters? Using pitch types linear weights wCB = How well does the batter do vs curves? Using pitch types linear weights wCH = How well does the batter do vs change-ups? Using pitch types linear weights wSB = How well does the batter do vs splitters? Using pitch types linear weights wFB/C = How well does the batter do vs fastballs per 100 pitches? wSL = How well does the batter do vs sliders per 100 pitches? wCT = How ...

  10. Record High Temperatures for US Cities

    • kaggle.com
    zip
    Updated Jan 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Record High Temperatures for US Cities [Dataset]. https://www.kaggle.com/datasets/thedevastator/record-high-temperatures-for-us-cities-in-2015
    Explore at:
    zip(9955 bytes)Available download formats
    Dataset updated
    Jan 18, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    Record High Temperatures for US Cities

    Clearly Defined Monthly Data

    By Gary Hoover [source]

    About this dataset

    This dataset contains all the record-breaking temperatures for your favorite US cities in 2015. With this information, you can prepare for any unexpected weather that may come your way in the future, or just revel in the beauty of these high heat spells from days past! With record highs spanning from January to December, stay warm (or cool) with these handy historical temperature data points

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains the record high temperatures for various US cities during the year of 2015. The dataset includes columns for each individual month, along with column for the records highs over the entire year. This data is sourced from www.weatherbase.com and can be used to analyze which cities experienced hot summers, or compare temperature variations between different regions.

    Here are some useful tips on how to work with this dataset: - Analyze individual monthly temperatures - this dataset allows you to compare high temperatures across months and locations in order to identify which areas experienced particularly hot summers or colder winters.
    - Compare annual versus monthly data - use this data to compare average annual highs against monthly highs in order to understand temperature trends at a given location throughout all four seasons of a single year, or explore how different regions vary based on yearly weather patterns as well as across given months within any one year; - Heatmap analysis - use this data plot temperature information in an interactive heatmap format in order to pinpoint particular regions that experience unique weather conditions or higher-than-average levels of warmth compared against cooler pockets of similar size geographic areas; - Statistically model the relationships between independent variables (temperature variations by month, region/city and more!) and dependent variables (e.g., tourism volumes). Use regression techniques such as linear models (OLS), ARIMA models/nonlinear transformations and other methods through statistical software such as STATA or R programming language;
    - Look into climate trends over longer periods - adjust time frames included in analyses beyond 2018 when possible by expanding upon the monthly station observations already present within the study timeframe utilized here; take advantage of digitally available historical temperature readings rather than relying only upon printed reports

    With these helpful tips, you can get started analyzing record high temperatures for US cities during 2015 using our 'Record High Temperatures for US Cities' dataset!

    Research Ideas

    • Create a heat map chart of US cities representing the highest temperature on record for each city from 2015.
    • Analyze trends in monthly high temperatures in order to predict future climate shifts and weather patterns across different US cities.
    • Track and compare monthly high temperature records for all US cities to identify regional hot spots with higher than average records and potential implications for agriculture and resource management planning

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: Highest temperature on record through 2015 by US City.csv | Column name | Description | |:--------------|:--------------------------------------------------------------| | CITY | Name of the city. (String) | | JAN | Record high temperature for the month of January. (Integer) | | FEB | Record high temperature for the month of February. (Integer) | | MAR | Record high temperature for the month of March. (Integer) | | APR | Record high temperature for the month of April. (Integer) | | MAY | Record high temperature for the month of May. (Integer) | | JUN | Record high temperature for the month of June. (Integer) | | JUL | Record high temperature for the month of July. (Integer) | | AUG | Record high temperature for the month of August. (Integer) | | SEP | Record high temperature for the month of September. (Integer) | | OCT | Record high temperature for the month of October. (Integer) | | ...

  11. u

    Data from: Identification of stable QTL controlling multiple yield...

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    xlsx
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amanda R. Peters Haugrud; Qijun Zhang; Andrew J. Green; Steven S. Xu; Justin Faris (2025). Data from: Identification of stable QTL controlling multiple yield components in a durum × cultivated emmer wheat population under field and greenhouse conditions [Dataset]. http://doi.org/10.15482/USDA.ADC/1528774
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Amanda R. Peters Haugrud; Qijun Zhang; Andrew J. Green; Steven S. Xu; Justin Faris
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Phentoypic data: The durum × cultivated emmer recombinant inbred line (RIL) population (referred to as DP527) was evaluated for grain yield components under greenhouse and field conditions in North Dakota, USA. The DP527 population was developed by crossing Divide (PI 642021), a North Dakota hard amber durum variety, with PI 272527, a cultivated emmer accession collected near Pest, Hungary. The DP527 population consisted of 219 RILs developed using the single-seed descent method to the F7 generation and bulked to produce F7:8 RILs. The DP527 population was evaluated under field conditions in a total of three seasons and were grown in a randomized complete block design (RCBD) with three replicates each season. Plants were grown in hill plots, with each plot consisting of 10-15 seeds and considered an experimental unit. The 2017 and 2019 plots were grown at the North Dakota State University (NDSU) field site near Prosper, ND (47.002°N, 97.115°W). The 2020 plots were grown at the NDSU agronomy seed farm near Casselton, ND (46.880°N, 97.243°W). The DP527 population and parental lines were phenotyped for 11 traits including days to heading (DTH), plant height (PHT), total number of spikelets per spike (SPS), kernels per spike (KPS), grain weight per spike (GWS), thousand kernel weight (TKW), kernel area (KA), kernel width (KW), kernel length (KL), kernel circularity (KC), and kernel length:width ratio (KLW). DTH was measured as the number of days from planting until 50% of the spikes emerged completely beyond the flag leaf. PHT was measured from the base of the hill plot to the tip of the highest spike (excluding awns) in the plot in centimeters. Eight heads from each replicate were used for phenotypic evaluations. SPS was counted as the total number of spikelets divided by the number of heads in the sample. KPS, GWS, TKW, KA, KW, KL, KC, and KLW data were obtained using a MARVIN grain analyzer (GAT Sensorik GMBH, Neubrandenburg, Germany). KPS and GWS data from the MARVIN was divided by the number of heads in the sample to obtain an average per wheat head. For the 2019 environment, planting occurred in late May, and by early September about one third of the lines were not mature. Therefore, only DTH, PHT, and SPS were evaluated in the 2019 field season. The DP527 population and parents were evaluated under greenhouse conditions in two greenhouse seasons (2018 and 2019) with two replicates per season. Plants were grown in 15 cm diameter pots in a greenhouse with 16-h photoperiod and a temperature of 21 °C. All plants were grown in a completely randomized design (CRD) with one plant per pot, which was one experimental unit. DTH was measured as the number of days from planting until the emergence of the first spike beyond the flag leaf, and PHT was measured from the base of the plant to the tip of the highest spike in centimeters. Plants were hand harvested and four heads per plant were used for the rest of the phenotypic evaluations, which were measured as described for field environments. In the data file, column headings indicate the trait evaluated, the year, field vs greenhouse, and replicate or average of all three replicates. For example, “SPS2017Frep1” indicates rep 1 of the spikelets per spike trait collected in the 2017 field trial. Sheet 1 consists of the field data, and sheet 2 is the greenhouse data. An entry of ‘NA’ indicates missing data. Genotypic data: DNA of the DP527 population was extracted and genotyped using the Illumina iSelect 90k wheat SNP array. The genotypic data file consists of the chromosome assignments of the markers, the marker names, the linkage map positions of the markers, and the genotypic calls for each marker within each RIL where “1” represents an allele from Divide, “2” represents an allele from PI 272527, and “3” indicates missing data. This data was used to assemble the linkage-based genetic maps for the 14 durum wheat chromosomes and further used in statistical analyses to identify chromosome regions harboring genes associated with the various phenotypic traits mentioned in the phenotypic data file. Resources in this dataset: Resource Title: Genotypic data for the durum x emmer wheat recombinant inbred population DP527 File Name: DP527 genotypic data.xlsx Resource Description: The genotypic data file consists of the chromosome assignments of the markers, the marker names, the linkage map positions of the markers, and the genotypic calls for each marker within each RIL where “1” represents an allele from Divide, “2” represents an allele from PI 272527, and “3” indicates missing data. Resource Title: Phenotypic data collected from the durum x emmer wheat recombinant inbred population DP527 File Name: DP527 phenotypic data.xlsx Resource Description: In the data file, column headings indicate the trait evaluated, the year, field vs greenhouse, and replicate or average of all three replicates. For example, “SPS2017Frep1” indicates rep 1 of the spikelets per spike trait collected in the 2017 field trial. Sheet 1 consists of the field data, and sheet 2 is the greenhouse data. An entry of ‘NA’ indicates missing data.

  12. 2

    HBAI

    • datacatalogue.ukdataservice.ac.uk
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Work and Pensions (2025). HBAI [Dataset]. http://doi.org/10.5255/UKDA-SN-5828-17
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Department for Work and Pensions
    Time period covered
    Mar 31, 1994 - Mar 31, 2024
    Area covered
    United Kingdom
    Description

    The Households Below Average Income (HBAI) data presents information on living standards in the UK based on household income measures for the financial year.

    HBAI uses equivalised disposable household income as a proxy for living standards in order to allow comparisons of the living standards of different types of households (that is, income is adjusted to take into account variations in the size and composition of the households in a process known as equivalisation). A key assumption made in HBAI is that all individuals in the household benefit equally from the combined income of the household. This enables the total equivalised income of the household to be used as a proxy for the standard of living of each household member.

    In line with international best practice, the income measures used in HBAI are subject to several statistical adjustments and, as such, are not always directly relatable to income amounts as they might be understood by people on a day-to-day basis. These adjustments, however, allow consistent comparison over time and across households of different sizes and compositions. HBAI uses variants of CPI inflation when estimating how incomes are changing in real terms over time.

    The main data source used in this study is the Family Resources Survey (FRS), a continuous cross-sectional survey. The FRS normally has a sample of 19,000 - 20,000 UK households. The use of survey data means that HBAI estimates are subject to uncertainty, which can affect how changes should be interpreted, especially in the short term. Analysis of geographies below the regional level is not recommended from this data.

    Further information and the latest publication can be found on the gov.uk HBAI webpage. The HBAI team want to provide user-friendly datasets and clearer documentation, so please contact team.hbai@dwp.gov.uk if you have any suggestions or feedback on the new harmonised datasets and documentation.

    An earlier HBAI study, Institute for Fiscal Studies Households Below Average Income Dataset, 1961-1991, is held under SN 3300.

    Latest Edition Information

    For the 19th edition (April 2025), resamples data have been added to the study alongside supporting documentation. Main data back to 1994/95 have been updated to latest-year prices, and the documentation has been updated accordingly.

    Using the HBAI files

    Users should note that either 7-Zip or a recent version of WinZip is needed to unzip the HBAI download zip files, due to their size. The inbuilt Windows compression software will not handle them correctly.

    Labelling of variables
    Users should note that many variables across the resamples files do not include full variable or value labels. This information can be found easily in the documentation - see the Harmonised Data Variables Guide.

    HBAI versions

    The HBAI datasets are available in two versions at the UKDS:

    1. End User Licence (EUL) (Anonymised) Datasets:

    These datasets contain no names, addresses, telephone numbers, bank account details, NINOs or any personal details that can be considered disclosive under the terms of the ONS Disclosure Control guidance. Changes made to the datasets are as follows:

    • All ages above 80 are instead top-coded to 80 years of age.
    • The variable for the amount of Council Tax liability for the household and pensioner flags for the head and spouse have been removed.
    • All amount variables have been rounded to the nearest £1.
    • A very small number of large households (with 10 or more individuals) have been removed from the dataset.

    2. Secure Access Datasets:

    Secure Access datasets for HBAI are held under SN 7196. The Secure Access data are not subject to the same edits as the EUL version and are, therefore, more disclosive and subject to strict access conditions. They are currently only available to UK HE/FE applicants. Prospective users of the Secure Access version of the HBAI must fulfil additional requirements beyond those associated with the EUL datasets.

  13. US Births 👶 by Year, State, and Education Level

    • kaggle.com
    zip
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Random Draw (2023). US Births 👶 by Year, State, and Education Level [Dataset]. https://www.kaggle.com/datasets/danbraswell/temporary-us-births/code
    Explore at:
    zip(61286 bytes)Available download formats
    Dataset updated
    May 8, 2023
    Authors
    Random Draw
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Introduction

    This dataset provides birth rates and related data across the 50 states and DC from 2016 to 2021. The data was sourced from the Centers for Disease Control and Prevention (CDC) and includes detailed information such as number of births, gender, birth weight, state, and year of the delivery. A particular emphasis is given to detailed information on the mother's educational level. With this dataset, one can, for example, examine trends and patterns in birth rates across different academic groups and geographic locations.

    Important Note

    Each row in the dataset is considered a category defined by the state, birth year, baby's gender, and educational level of the mother. Three quantities are given for each category: number of births, mother's average age, and average baby weight. The CDC is sensitive to potentially disclosing personal information, so any category with less than ten births is suppressed. For this reason, you will find 12 rows missing out of an expected 5,508 \( \text{51 states * 6 years * 2 genders * 9 edu levels = 5,508} \) Those missing rows all had the mother's educational level listed as "unknown or not stated" and their absence should not significantly impact studies or conclusions made using the dataset.

    Origin

    The data in this dataset was obtained using CDC's WONDER retrieval tool on the CDC Natality page

    Column Descriptions

    • State ➡️ state name in full (includes District of Columbia)
    • State Abbreviation ➡️ 2-character state abbreviation
    • Year ➡️ 4-digit year
    • Gender ➡️ Gender of baby
    • Education Level of Mother ➡️ See table below
    • Education Level Code ➡️ See table below
    • Number of Births ➡️ Number of births for the category
    • Average Age of Mother (years) ➡️ Mother's average age in the category
    • Average Birth Weight (g) ➡️ Average birth weight in the category
      ## Education levels and codes used in dataset | Code | Mother's Education Level | |:--- |:--- | |1 |8th grade or less | |2 |9th through 12th grade with no diploma | |3 |High school graduate or GED completed | |4 |Some college credit, but not a degree | |5 |Associate degree (AA, AS) | |6 |Bachelor's degree (BA, AB, BS) | |7 |Master's degree (MA, MS, MEng, MEd, MSW, MBA) | |8 |Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD) | |-9 |Unknown or Not Stated |

    Acknowledgement

    Image by Sarah Richter from Pixabay

  14. U

    Inflation Data

    • dataverse.unc.edu
    • dataverse-staging.rdmc.unc.edu
    Updated Oct 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNC Dataverse (2022). Inflation Data [Dataset]. http://doi.org/10.15139/S3/QA4MPU
    Explore at:
    Dataset updated
    Oct 9, 2022
    Dataset provided by
    UNC Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is not going to be an article or Op-Ed about Michael Jordan. Since 2009 we've been in the longest bull-market in history, that's 11 years and counting. However a few metrics like the stock market P/E, the call to put ratio and of course the Shiller P/E suggest a great crash is coming in-between the levels of 1929 and the dot.com bubble. Mean reversion historically is inevitable and the Fed's printing money experiment could end in disaster for the stock market in late 2021 or 2022. You can read Jeremy Grantham's Last Dance article here. You are likely well aware of Michael Burry's predicament as well. It's easier for you just to skim through two related videos on this topic of a stock market crash. Michael Burry's Warning see this YouTube. Jeremy Grantham's Warning See this YouTube. Typically when there is a major event in the world, there is a crash and then a bear market and a recovery that takes many many months. In March, 2020 that's not what we saw since the Fed did some astonishing things that means a liquidity sloth and the risk of a major inflation event. The pandemic represented the quickest decline of at least 30% in the history of the benchmark S&P 500, but the recovery was not correlated to anything but Fed intervention. Since the pandemic clearly isn't disappearing and many sectors such as travel, business travel, tourism and supply chain disruptions appear significantly disrupted - the so-called economic recovery isn't so great. And there's this little problem at the heart of global capitalism today, the stock market just keeps going up. Crashes and corrections typically occur frequently in a normal market. But the Fed liquidity and irresponsible printing of money is creating a scenario where normal behavior isn't occurring on the markets. According to data provided by market analytics firm Yardeni Research, the benchmark index has undergone 38 declines of at least 10% since the beginning of 1950. Since March, 2020 we've barely seen a down month. September, 2020 was flat-ish. The S&P 500 has more than doubled since those lows. Look at the angle of the curve: The S&P 500 was 735 at the low in 2009, so in this bull market alone it has gone up 6x in valuation. That's not a normal cycle and it could mean we are due for an epic correction. I have to agree with the analysts who claim that the long, long bull market since 2009 has finally matured into a fully-fledged epic bubble. There is a complacency, buy-the dip frenzy and general meme environment to what BigTech can do in such an environment. The weight of Apple, Amazon, Alphabet, Microsoft, Facebook, Nvidia and Tesla together in the S&P and Nasdaq is approach a ridiculous weighting. When these stocks are seen both as growth, value and companies with unbeatable moats the entire dynamics of the stock market begin to break down. Check out FANG during the pandemic. BigTech is Seen as Bullet-Proof me valuations and a hysterical speculative behavior leads to even higher highs, even as 2020 offered many younger people an on-ramp into investing for the first time. Some analysts at JP Morgan are even saying that until retail investors stop charging into stocks, markets probably don’t have too much to worry about. Hedge funds with payment for order flows can predict exactly how these retail investors are behaving and monetize them. PFOF might even have to be banned by the SEC. The risk-on market theoretically just keeps going up until the Fed raises interest rates, which could be in 2023! For some context, we're more than 1.4 years removed from the bear-market bottom of the coronavirus crash and haven't had even a 5% correction in nine months. This is the most over-priced the market has likely ever been. At the night of the dot-com bubble the S&P 500 was only 1,400. Today it is 4,500, not so many years after. Clearly something is not quite right if you look at history and the P/E ratios. A market pumped with liquidity produces higher earnings with historically low interest rates, it's an environment where dangerous things can occur. In late 1997, as the S&P 500 passed its previous 1929 peak of 21x earnings, that seemed like a lot, but nothing compared to today. For some context, the S&P 500 Shiller P/E closed last week at 38.58, which is nearly a two-decade high. It's also well over double the average Shiller P/E of 16.84, dating back 151 years. So the stock market is likely around 2x over-valued. Try to think rationally about what this means for valuations today and your favorite stock prices, what should they be in historical terms? The S&P 500 is up 31% in the past year. It will likely hit 5,000 before a correction given the amount of added liquidity to the system and the QE the Fed is using that's like a huge abuse of MMT, or Modern Monetary Theory. This has also lent to bubbles in the housing market, crypto and even commodities like Gold with long-term global GDP meeting many headwinds in the years ahead due to a demographic shift of an ageing population and significant technological automation. So if you think that stocks or equities or ETFs are the best place to put your money in 2022, you might want to think again. The crash of the OTC and small-cap market since February 2021 has been quite an indication of what a correction looks like. According to the Motley Fool what happens after major downturns in the market historically speaking? In each of the previous four instances that the S&P 500's Shiller P/E shot above and sustained 30, the index lost anywhere from 20% to 89% of its value. So what's what we too are due for, reversion to the mean will be realistically brutal after the Fed's hyper-extreme intervention has run its course. Of course what the Fed stimulus has really done is simply allowed the 1% to get a whole lot richer to the point of wealth inequality spiraling out of control in the decades ahead leading us likely to a dystopia in an unfair and unequal version of BigTech capitalism. This has also led to a trend of short squeeze to these tech stocks, as shown in recent years' data. Of course the Fed has to say that's its done all of these things for the people, employment numbers and the labor market. Women in the workplace have been set behind likely 15 years in social progress due to the pandemic and the Fed's response. While the 89% lost during the Great Depression would be virtually impossible today thanks to ongoing intervention from the Federal Reserve and Capitol Hill, a correction of 20% to 50% would be pretty fair and simply return the curve back to a normal trajectory as interest rates going back up eventually in the 2023 to 2025 period. It's very unlikely the market has taken Fed tapering into account (priced-in), since the euphoria of a can't miss market just keeps pushing the markets higher. But all good things must come to an end. Earlier this month, the U.S. Bureau of Labor Statistics released inflation data from July. This report showed that the Consumer Price Index for All Urban Consumers rose 5.2% over the past 12 months. While the Fed and economists promise us this inflation is temporary, others are not so certain. As you print so much money, the money you have is worth less and certain goods cost more. Wage gains in some industries cannot be taken back, they are permanent - in the service sector like restaurants, hospitality and travel that have been among the hardest hit. The pandemic has led to a paradigm shift in the future of work, and that too is not temporary. The Great Resignation means white collar jobs with be more WFM than ever before, with a new software revolution, different transport and energy behaviors and so forth. Climate change alone could slow down global GDP in the 21st century. How can inflation be temporary when so many trends don't appear to be temporary? Sure the price of lumber or used-cars could be temporary, but a global chip shortage is exasperating the automobile sector. The stock market isn't even behaving like it cares about anything other than the Fed, and its $billions of dollars of buying bonds each month. Some central banks will start to taper about December, 2021 (like the European). However Delta could further mutate into a variant that makes the first generation of vaccines less effective. Such a macro event could be enough to trigger the correction we've been speaking about. So stay safe, and keep your money safe. The Last Dance of the 2009 bull market could feel especially more painful because we've been spoiled for so long in the markets. We can barely remember what March, 2020 felt like. Some people sold their life savings simply due to scare tactics by the likes of Bill Ackman. His scare tactics on CNBC won him likely hundreds of millions as the stock market tanked. Hedge funds further gamed the Reddit and Gamestop movement, orchestrating them and leading the new retail investors into meme speculation and a whole bunch of other unsavory things like options trading at such scale we've never seen before. It's not just inflation and higher interest rates, it's how absurdly high valuations have become. Still correlation does not imply causation. Just because inflation has picked up, it doesn't guarantee that stocks will head lower. Nevertheless, weaker buying power associated with higher inflation can't be overlooked as a potential negative for the U.S. economy and equities. The current S&P500 10-year P/E Ratio is 38.7. This is 97% above the modern-era market average of 19.6, putting the current P/E 2.5 standard deviations above the modern-era average. This is just math, folks. History is saying the stock market is 2x its true value. So why and who would be full on the market or an asset class like crypto that is mostly speculative in nature to begin with? Study the following on a historical basis, and due your own due diligence as to the health of the markets: Debt-to-GDP ratio Call to put ratio

  15. u

    Employed Employees by Average Actual Weekly Hours Worked and Average Weekly...

    • data.urbandatacentre.ca
    • open.alberta.ca
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Employed Employees by Average Actual Weekly Hours Worked and Average Weekly and Hourly Earnings in Alberta (Annual Average) (1997 - 2010) [Dataset]. https://data.urbandatacentre.ca/dataset/ab-employed-employees-by-average-actual-weekly-hours-worked-and-earnings-in-alberta-1997-2010
    Explore at:
    Dataset updated
    Oct 19, 2025
    Area covered
    Alberta
    Description

    (StatCan Product) Employed employees in Alberta by average actual weekly hours worked and average weekly and hourly earnings (annual averages). Customization details: This information product has been customized to present information on Employed employees in Alberta by average actual weekly hours worked and average weekly and hourly earnings (annual averages). Variables for Full-Time and Part-Time: Labour Characteristics Variables included: - Total Employees - Average Actual Hours Worked - Average Weekly Earnings - Average Hourly Earnings Age and Groups include: - Both Sexes, Men, Women for - 15+ years - 15-24 years - 25+ years - Union Coverage - Non-Union Coverage - Permanent - Temporary - 1-19 Employees - 20-99 Employees - 100-500 Employees - > 500 Employees - Management - Business Finance and Admin - Nat. and Applied Sciences - Health - Social Sciences, Educ. - Art, Culture, Recreation, etc. - Sales and Services - Trades, Transportation, etc. - Unique to Primary Industry - Processing, Manuf. and Util. - Total - 0-8 years - Some High School - High School Graduate - Some Post-Secondary - Post-Secondary Certif or Diploma - University Degree - University Degree - Bachelors - University Degree - Post Grad. Labour Force Survey The Canadian Labour Force Survey was developed following the Second World War to satisfy a need for reliable and timely data on the labour market. Information was urgently required on the massive labour market changes involved in the transition from a war to a peace-time economy. The main objective of the LFS is to divide the working-age population into three mutually exclusive classifications - employed, unemployed, and not in the labour force - and to provide descriptive and explanatory data on each of these. Target population The LFS covers the civilian, non-institutionalized population 15 years of age and over. It is conducted nationwide, in both the provinces and the territories. Excluded from the survey's coverage are: persons living on reserves and other Aboriginal settlements in the provinces; full-time members of the Canadian Armed Forces and the institutionalized population. These groups together represent an exclusion of less than 2% of the Canadian population aged 15 and over. National Labour Force Survey estimates are derived using the results of the LFS in the provinces. Territorial LFS results are not included in the national estimates, but are published separately. Documentation – Labour Force Survey Instrument design The current LFS questionnaire was introduced in 1997. At that time, significant changes were made to the questionnaire in order to address existing data gaps, improve data quality and make more use of the power of Computer Assisted Interviewing (CAI). The changes incorporated included the addition of many new questions. For example, questions were added to collect information about wage rates, union status, job permanency and workplace size for the main job of currently employed employees. Other additions included new questions to collect information about hirings and separations, and expanded response category lists that split existing codes into more detailed categories. Sampling This is a sample survey with a cross-sectional design. Data sources Responding to this survey is mandatory. Data are collected directly from survey respondents. Data collection for the LFS is carried out each month during the week following the LFS reference week. The reference week is normally the week containing the 15th day of the month. LFS interviews are conducted by telephone by interviewers working out of a regional office CATI (Computer Assisted Telephone Interviews) site or by personal visit from a field interviewer. Since 2004, dwellings new to the sample in urban areas are contacted by telephone if the telephone number is available from administrative files, otherwise the dwelling is contacted by a field interviewer. The interviewer first obtains socio-demographic information for each household member and then obtains labour force information for all members aged 15 and over who are not members of the regular armed forces. The majority of subsequent interviews are conducted by telephone. In subsequent monthly interviews the interviewer confirms the socio-demographic information collected in the first month and collects the labour force information for the current month. Persons aged 70 and over are not asked the labour force questions in subsequent interviews, but rather their labour force information is carried over from their first interview. In each dwelling, information about all household members is usually obtained from one knowledgeable household member. Such 'proxy' reporting, which accounts for approximately 65% of the information collected, is used to avoid the high cost and extended time requirements that would be involved in repeat visits or calls necessary to obtain information directly from each respondent. Error detection The LFS CAI questionnaire incorporates many features that serve to maximize the quality of the data collected. There are many edits built into the CAI questionnaire to compare the entered data against unusual values, as well as to check for logical inconsistencies. Whenever an edit fails, the interviewer is prompted to correct the information (with the help of the respondent when necessary). For most edit failures the interviewer has the ability to override the edit failure if they cannot resolve the apparent discrepancy. As well, for most questions the interviewer has the ability to enter a response of Don't Know or Refused if the respondent does not answer the question. Once the data is received back at head office an extensive series of processing steps is undertaken to thoroughly verify each record received. This includes the coding of industry and occupation information and the review of interviewer entered notes. The editing and imputation phases of processing involve the identification of logically inconsistent or missing information items, and the correction of such conditions. Since the true value of each entry on the questionnaire is not known, the identification of errors can be done only through recognition of obvious inconsistencies (for example, a 15 year-old respondent who is recorded as having last worked in 1940). Estimation The final step in the processing of LFS data is the assignment of a weight to each individual record. This process involves several steps. Each record has an initial weight that corresponds to the inverse of the probability of selection. Adjustments are made to this weight to account for non-response that cannot be handled through imputation. In the final weighting step all of the record weights are adjusted so that the aggregate totals will match with independently derived population estimates for various age-sex groups by province and major sub-provincial areas. One feature of the LFS weighting process is that all individuals within a dwelling are assigned the same weight. In January 2000, the LFS introduced a new estimation method called Regression Composite Estimation. This new method was used to re-base all historical LFS data. It is described in the research paper ""Improvements to the Labour Force Survey (LFS)"", Catalogue no. 71F0031X. Additional improvements are introduced over time; they are described in different issues of the same publication. Data accuracy Since the LFS is a sample survey, all LFS estimates are subject to both sampling error and non-sampling errors. Non-sampling errors can arise at any stage of the collection and processing of the survey data. These include coverage errors, non-response errors, response errors, interviewer errors, coding errors and other types of processing errors. Non-response to the LFS tends to average about 10% of eligible households. Interviews are instructed to make all reasonable attempts to obtain LFS interviews with members of eligible households. Each month, after all attempts to obtain interviews have been made, a small number of non-responding households remain. For households non-responding to the LFS, a weight adjustment is applied to account for non-responding households. Sampling errors associated with survey estimates are measured using coefficients of variation for LFS estimates as a function of the size of the estimate and the geographic area.

  16. Facebook: distribution of global audiences 2025, by age and gender

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Facebook: distribution of global audiences 2025, by age and gender [Dataset]. https://www.statista.com/statistics/376128/facebook-global-user-age-distribution/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2025
    Area covered
    Worldwide
    Description

    As of October 2025, men aged 25 to 34 represented Facebook’s largest user group, making up **** percent of the platform’s global audience. Across all age groups except those aged 65 and older, male users outnumbered female users. Facebook connects the world Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger, as well as photo-sharing app Instagram. Facebook users The vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with *** million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

  17. Road safety statistics: data tables

    • gov.uk
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Road safety statistics: data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/reported-road-accidents-vehicles-and-casualties-tables-for-great-britain
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    These tables present high-level breakdowns and time series. A list of all tables, including those discontinued, is available in the table index. More detailed data is available in our data tools, or by downloading the open dataset.

    We are proposing to make some changes to these tables in future, further details can be found alongside the latest provisional statistics.

    Latest data and table index

    The tables below are the latest final annual statistics for 2024, which are currently the latest available data. Provisional statistics for the first half of 2025 are also available, with provisional data for the whole of 2025 scheduled for publication in May 2026.

    A list of all reported road collisions and casualties data tables and variables in our data download tool is available in the https://assets.publishing.service.gov.uk/media/6925869422424e25e6bc3105/reported-road-casualties-gb-index-of-tables.ods">Tables index (ODS, 28.9 KB).

    All collision, casualty and vehicle tables

    https://assets.publishing.service.gov.uk/media/68d42292b6c608ff9421b2d2/ras-all-tables-excel.zip">Reported road collisions and casualties data tables (zip file) (ZIP, 11.2 MB)

    Historic trends (RAS01)

    RAS0101: https://assets.publishing.service.gov.uk/media/68d3cdeeca266424b221b253/ras0101.ods">Collisions, casualties and vehicles involved by road user type since 1926 (ODS, 34.7 KB)

    RAS0102: https://assets.publishing.service.gov.uk/media/68d3cdfee65dc716bfb1dcf3/ras0102.ods">Casualties and casualty rates, by road user type and age group, since 1979 (ODS, 129 KB)

    Road user type (RAS02)

    RAS0201: https://assets.publishing.service.gov.uk/media/68d3ce0bc908572e81248c1f/ras0201.ods">Numbers and rates (ODS, 37.5 KB)

    RAS0202: https://assets.publishing.service.gov.uk/media/68d3ce17b6c608ff9421b25e/ras0202.ods">Sex and age group (ODS, 178 KB)

    RAS0203: https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods">Rates by mode, including air, water and rail modes (ODS, 24.2 KB) - this table will be updated for 2024 once data is available for other modes.

    Road type (RAS03)

    RAS0301: https://assets.publishing.service.gov.uk/media/68d3ce2b8c739d679fb1dcf6/ras0301.ods">Speed limit, built-up and non-built-up roads (<span class="gem-c-attachmen

  18. Instagram accounts with the most followers worldwide 2024

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  19. 🥏 2024 College Ultimate Championship Statistics

    • kaggle.com
    zip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🥏 2024 College Ultimate Championship Statistics [Dataset]. https://www.kaggle.com/datasets/mexwell/2024-college-ultimate-championship-statistics
    Explore at:
    zip(35717 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    mexwell
    Description

    Motivation

    Ultimate, also known as Ultimate Frisbee, is a non-contact sport where 2 teams of 7 players pass a frisbee to each other and try to reach the opposing teams ‘end zone’ and score a point without dropping the frisbee. Each point starts with a pull, where the defensive team throws the frisbee from their end zone to the end zone of the offensive team. The goal of the defensive team is to force the offensive team to turnover the disc by dropping it or throwing it out of bounds. If the defensive team forces a turnover, they can then try to score while the offensive team tries to stop them. The team that scores a point then executes the next pull to the team that got scored on.

    College ultimate games follow the USA Ultimate rules, which dictate a game to 15 goals, with halftime occurring when a team has scored 8 goals. The rules also allow for a time cap, where once the cap is reached the game is played to a score determined by the score of the team that is winning. Like other sports, there are two main positions in ultimate called handler and cutter. Teams often have 2 or 3 handlers playing at a time and they are players who are versatile throwers and orchestrate how the offense is run. Cutters are players who run around and try to escape defenders in order to receive the frisbee from the handlers. Teams have different formations and offensive schemes they use to try and find openings to make gaining yardage and scoring points easier. Teams tend to have offensive and defensive lines in order to save players from playing every point and split the load of games.

    Measured Statistics Data This data comes from the 2024 Division 1 and 3 Men’s and Women’s College Ultimate Championships. It contains scoring and defensive statistics for players from each game played at the two tournaments. This table has 1665 rows and 16 columns, 1 row for each player that contains all the statistics for that player.

    Variable Descriptions

    -player player name - level the level that the player was competing at - gender gender of the player’s division - division level and gender of the competing player’s division - team_name full name of the players team - Turns the number of turnovers the player threw - D s the number of defensive blocks the player made - Assists the number of assists the player threw - Points the number of points the player scored - plus_minus the point differential of the player for offensive points - team_games the number of games played - turns_per_game the average turnovers per game -ds_per_game the average defensive blocks per game - ast_per_game the average assists per game - pts_per_game the average pts per game - pls_mns_per_game the average plus minus per game

    Questions

    • What are some ways to graphically represent variables together to compare divisions?
    • What might strengths in different variable mean about players/teams?
    • What variables differ discernibly between different divisions?

    References

    Statistics found on USA Ultimate and taken from a data visualization, USA Ultimate 2024 Nationals Stats Dashboard done by Ben Ayres.

    Acknowledgement

    Foto von ALEXANDRE LALLEMAND auf Unsplash

  20. Trash Wheel Collection Data

    • kaggle.com
    zip
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2024). Trash Wheel Collection Data [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/trash-wheel-collection-data
    Explore at:
    zip(21019 bytes)Available download formats
    Dataset updated
    Mar 12, 2024
    Authors
    Joakim Arvidsson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Trash Wheel Collection Data

    This dataset is from Trash Wheel Collection Data from the Mr. Trash Wheel Baltimore Healthy Harbor initiative.

    Mr. Trash Wheel is a semi-autonomous trash interceptor that is placed at the end of a river, stream or other outfall. Far too lazy to chase trash around the ocean, Mr. Trash Wheel stays put and waits for the waste to flow to him. Sustainably powered and built to withstand the biggest storms, Mr. Trash Wheel uses a unique blend of solar and hydro power to pull hundreds of tons of trash out of the water each year.

    The Healthy Harbor initiative has four Trash Wheels collecting trash. Mr. Trash Wheel was the first to start, and since then three more have joined the family. The Trash Wheel Family has collected more than 2,362 tons of trash. See more about how Mr. Trash Wheel works.

    Data collection methodology

    1. When crew members are on the machine during the time when a dumpster is being filled, they will manually count the number of each of the item types listed on a single conveyor paddle. This process is repeated several times during the dumpster filling process. An average is then calculated for number of each item per paddle. The average is then multiplied by the paddle rate and then by the elapsed time to fill the dumpster.

    Example: * Paddle #1- 9 plastic bottles * Paddle #2- 14 plastic bottles * Paddle #3- 5 plastic bottles * Paddle #4- 12 plastic bottles * Average = 10 plastic bottles/paddle

    Conveyor speed = 2.5 paddles per minute therefore an average of 25 plastic bottles are loaded each minute. If it takes 100 minutes to fill the dumpster, we estimate that there are 2,500 bottles in that dumpster.

    1. If no crew is present during the loading, we will take random bushel size samples of the collected material and count items in these samples. A full dumpster contains approximately 325 bushels. Therefore, if an average bushel sample from a dumpster contains 3 polystyrene containers, we estimate that the dumpster contains 975 polystyrene containers.
    1. Periodically “dumpster dives” are held where volunteers count everything in an entire dumpster. These events help validate our sampling methods and also look at what materials are dumpster. present that are not included in our sampling categories.

    What type of trash is collected the most? Do the different Trash Wheels collect different sets of trash? Are there times of the year when more or less trash is collected?

    Data Dictionary

    trashwheel.csv

    variableclassdescription
    IDcharacterShort name for the Trash Wheel
    NamecharacterName of the Trash Wheel
    DumpsterdoubleDumpster number
    MonthcharacterMonth
    YeardoubleYear
    DatecharacterDate
    WeightdoubleWeight in tons
    VolumedoubleVolume in cubic yards
    PlasticBottlesdoubleNumber of plastic bottles
    PolystyrenedoubleNumber of polystyrene items
    CigaretteButtsdoubleNumber of cigarette butts
    GlassBottlesdoubleNumber of glass bottles
    PlasticBagsdoubleNumber of plastic bags
    WrappersdoubleNumber of wrappers
    SportsBallsdoubleNumber of sports balls
    HomesPowereddoubleHomes Powered - Each ton of trash equates to on average 500 kilowatts of electricity. An average household will use 30 kilowatts per day.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mahmoud Elhemaly (2025). Student Performance & Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/mahmoudelhemaly/students-grading-dataset
Organization logo

Student Performance & Behavior Dataset

Students Grading Analysis

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(1020509 bytes)Available download formats
Dataset updated
May 28, 2025
Authors
Mahmoud Elhemaly
Description

Student Performance & Behavior Dataset

This dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.

Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.

The Attendance is not part of the Total_Score or has very minimal weight.

Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects

ComponentWeight (%)
Midterm15%
Final25%
Assignments Avg15%
Quizzes Avg10%
Participation5%
Projects Score30%
Total100%

Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).

Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset

Search
Clear search
Close search
Google apps
Main menu