45 datasets found
  1. m

    SPHERE: Students' performance dataset of conceptual understanding,...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Purwoko Haryadi Santoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

  2. u

    Data from: Crab Nebula DL3 example dataset from the LST-1 performance study

    • produccioncientifica.ucm.es
    • zenodo.org
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morcuende, Daniel; Morcuende, Daniel (2024). Crab Nebula DL3 example dataset from the LST-1 performance study [Dataset]. https://produccioncientifica.ucm.es/documentos/67321d79aea56d4af048475a
    Explore at:
    Dataset updated
    2024
    Authors
    Morcuende, Daniel; Morcuende, Daniel
    Description

    DL3 example dataset from Crab Nebula observations with LST-1

    This repository contains a subsample of DL3 files from Crab Nebula observations used in the performance study of the Large-Sized Telescope prototype (LST-1, https://www.lst1.iac.es/) for the Cherenkov Telescope Array Observatory (CTAO, https://www.ctao.org/). The results of this performance study [1] were obtained from a larger sample of the Crab Nebula observations than the one compiled here.

    This reduced dataset aims to serve as an example for analyzing observed data by one of the telescopes that will be part of the future CTAO. These data files are intended to be used in the hands-on sessions for the 1D high-level DL3 analysis in the CTAO Shcool (https://www.school.cta-observatory.org/).

    Information about the data and the reduction process

    The DL3 files included in this repository are a subsample of 1.9 hours of Crab Nebula observations data taken on March 4th and 5th, 2022. They were produced with cta-lstchain in FITS format following the Gamma-ray Astronomy Data Format (GADF; [3]). They can be directly read and analyzed with Gammapy [4]. Data were processed following the source-independent analysis approach described in [1]. The gamma-hadron separation and directional cuts (gammaness and theta parameters) for the gamma-ray-like event selection were chosen to keep 70% of gamma-ray-like simulated events in each bin of reconstructed energy. The point-like instrument response functions (IRF) were produced (using pyirf [5]) using the same energy-dependent efficiency cuts from simulated gamma rays in an all-sky grid of pointing positions. Final IRFs for each observation run were produced by linear interpolation among the closest simulated pointing nodes to the actual telescope pointing while observing the Crab Nebula.

    List of files

    dl3_LST-1.Run07253.fits

    dl3_LST-1.Run07254.fits

    dl3_LST-1.Run07255.fits

    dl3_LST-1.Run07256.fits

    dl3_LST-1.Run07274.fits

    dl3_LST-1.Run07275.fits

    dl3_LST-1.Run07276.fits

    dl3_LST-1.Run07277.fits

    hdu-index.fits.gz

    obs-index.fits.gz

    Acknowledgements

    The production of these files has been possible thanks to the LST Collaboration work at different levels, namely, hardware and software development, data-taking, production of simulations, and data analysis.

    References

    [1] H. Abe et al 2023 ApJ 956 80 (DOI 10.3847/1538-4357/ace89d)

    [2] cta-lstchain: https://doi.org/10.5281/zenodo.10849683

    [3] Data formats for gamma-ray astronomy. https://github.com/open-gamma-ray-astro/gamma-astro-data-formats

    [4] A&A, 678, A157 (2023) DOI https://doi.org/10.1051/0004-6361/202346488

    [5] pyirf: https://doi.org/10.5281/zenodo.8348922

  3. Assessing resilience of NY drinking water system to normal and extreme...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Assessing resilience of NY drinking water system to normal and extreme scenarios Dataset 072621 [Dataset]. https://catalog.data.gov/dataset/assessing-resilience-of-ny-drinking-water-system-to-normal-and-extreme-scenarios-dataset-0
    Explore at:
    Dataset updated
    Nov 26, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    New York
    Description

    This is dataset contains the results of the Water Network Tool for Resilience (WNTR) case study application on a New York drinking water system. The data includes the population impacted from the firefighting and pipe criticality analysis; the water service availability (WSA) and pressure for the loss of source water scenarios; and the modified resilience index and the combined performance index for an example pipe criticality simulation and the loss of source water scenarios. This dataset is associated with the following publication: Chu-Ketterer, L., R. Murray, P. Hassett, J. Kogan, K. Klise, and T. Haxton. Performance and Resilience Analysis of a New York Drinking Water System to Localized and System-Wide Emergencies. JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT. American Society of Civil Engineers (ASCE), Reston, VA, USA, 149(1): 05022015, (2023).

  4. Vehicle Emissions and Performance Dataset

    • kaggle.com
    zip
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dennis Pogrebchtchikov (2024). Vehicle Emissions and Performance Dataset [Dataset]. https://www.kaggle.com/datasets/pogrebchtchikov/vehicle-emissions-and-performance-dataset
    Explore at:
    zip(109847 bytes)Available download formats
    Dataset updated
    Mar 14, 2024
    Authors
    Dennis Pogrebchtchikov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data Description

    File: vehicle_data.csv Columns: Vehicle_ID: Unique identifier for each vehicle Engine_Size (liters): Engine displacement Cylinders: Number of cylinders Fuel_Type: (Gasoline, Diesel, Hybrid) City_MPG: Fuel efficiency in city driving (miles per gallon) Highway_MPG: Fuel efficiency in highway driving (miles per gallon) CO2_Emissions (grams/mile): Carbon dioxide emissions Data Collection and Preprocessing

    Source: Data collected from resources like https://www.fueleconomy.gov/ and manufacturer websites.

    Preprocessing: Missing values were handled using mean/median imputation (depending on data distribution). Categorical features (e.g., Fuel_Type) were one-hot encoded. Potential Use Cases

    Training regression models to predict CO2 emissions based on vehicle characteristics. Developing classification models to categorize vehicles into emission groups (low, medium, high). Building fuel consumption prediction models for route optimization and logistics. Analyzing the relationship between vehicle features and environmental impact.

    Dataset Structure (vehicle_data.csv)

    Code snippet Vehicle_ID,Engine_Size,Cylinders,Fuel_Type,City_MPG,Highway_MPG,CO2_Emissions 1,2.0,4,Gasoline,28,36,320 2,3.5,6,Gasoline,20,28,405 3,1.8,4,Hybrid,45,52,210 4,3.0,6,Diesel,22,30,430 ... Use code with caution.

    Ethical Considerations

    Responsible Use: Promote the development of AI models that support environmentally conscious decision-making in the automotive industry. Bias: Strive to uncover and reduce potential biases present in the data.

    Contribution

    We welcome contributions to expand and improve this dataset. Please follow these guidelines:

    Example Dataset (vehicle_data.csv) You can find a small-scale example dataset like the one described above on platforms like:

    Kaggle: Search for "vehicle emissions datasets" on https://www.kaggle.com/datasets UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php Key Points:

    Clarity: Focus on making the structure easy to understand. Comprehensiveness: Cover the dataset's origin, composition, potential applications, and ethical considerations. Adaptability: This template can be modified to fit a wide range of AI datasets.

    Let me know if you'd like a more detailed dataset example or adjustments for a specific AI project!

  5. n

    Repository Analytics and Metrics Portal (RAMP) 2020 data

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2020 data [Dataset]. http://doi.org/10.5061/dryad.dv41ns1z4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2020. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2020-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2020.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2020-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2020.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  6. m

    Dataset for The effects of a number line intervention on calculation skills

    • figshare.mq.edu.au
    • researchdata.edu.au
    txt
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 12, 2023
    Dataset provided by
    Macquarie University
    Authors
    Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Study information The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset. All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders. The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point. The number of measurement points were distributed across participants as follows: Participant 1 – 3 baseline, 6 treatment, 1 post-treatment Participant 3 – 2 baseline, 7 treatment, 1 post-treatment Participant 5 – 2 baseline, 5 treatment, 1 post-treatment Participant 6 – 3 baseline, 4 treatment, 1 post-treatment Participant 7 – 2 baseline, 5 treatment, 1 post-treatment In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

    Measures Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

    Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

    Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

    Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

    The Number Line Intervention During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

    Variables in the dataset Age = age in ‘years, months’ at the start of the study Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents) Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

    The second part of the variable name refers to the task, as follows: DC = dot comparison SDC = single-digit computation NLE_UT = number line estimation (untrained set) NLE_T= number line estimation (trained set) CE = multidigit computational estimation NC = number comparison The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

    Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.

  7. Stock Portfolio Data with Prices and Indices

    • kaggle.com
    zip
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Stock Portfolio Data with Prices and Indices [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/stock-portfolio-data-with-prices-and-indices
    Explore at:
    zip(1573175 bytes)Available download formats
    Dataset updated
    Mar 23, 2025
    Authors
    Nikita Manaenkov
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset consists of five CSV files that provide detailed data on a stock portfolio and related market performance over the last 5 years. It includes portfolio positions, stock prices, and major U.S. market indices (NASDAQ, S&P 500, and Dow Jones). The data is essential for conducting portfolio analysis, financial modeling, and performance tracking.

    1. Portfolio

    This file contains the portfolio composition with details about individual stock positions, including the quantity of shares, sector, and their respective weights in the portfolio. The data also includes the stock's closing price.

    • Columns:
      • Ticker: The stock symbol (e.g., AAPL, TSLA)
      • Quantity: The number of shares in the portfolio
      • Sector: The sector the stock belongs to (e.g., Technology, Healthcare)
      • Close: The closing price of the stock
      • Weight: The weight of the stock in the portfolio (as a percentage of total portfolio)

    2. Portfolio Prices

    This file contains historical pricing data for the stocks in the portfolio. It includes daily open, high, low, close prices, adjusted close prices, returns, and volume of traded stocks.

    • Columns:
      • Date: The date of the data point
      • Ticker: The stock symbol
      • Open: The opening price of the stock on that day
      • High: The highest price reached on that day
      • Low: The lowest price reached on that day
      • Close: The closing price of the stock
      • Adjusted: The adjusted closing price after stock splits and dividends
      • Returns: Daily percentage return based on close prices
      • Volume: The volume of shares traded that day

    3. NASDAQ

    This file contains historical pricing data for the NASDAQ Composite index, providing similar data as in the Portfolio Prices file, but for the NASDAQ market index.

    • Columns:
      • Date: The date of the data point
      • Ticker: The stock symbol (for NASDAQ index, this will be "IXIC")
      • Open: The opening price of the index
      • High: The highest value reached on that day
      • Low: The lowest value reached on that day
      • Close: The closing value of the index
      • Adjusted: The adjusted closing value after any corporate actions
      • Returns: Daily percentage return based on close values
      • Volume: The volume of shares traded

    4. S&P 500

    This file contains similar historical pricing data, but for the S&P 500 index, providing insights into the performance of the top 500 U.S. companies.

    • Columns:
      • Date: The date of the data point
      • Ticker: The stock symbol (for S&P 500 index, this will be "SPX")
      • Open: The opening price of the index
      • High: The highest value reached on that day
      • Low: The lowest value reached on that day
      • Close: The closing value of the index
      • Adjusted: The adjusted closing value after any corporate actions
      • Returns: Daily percentage return based on close values
      • Volume: The volume of shares traded

    5. Dow Jones

    This file contains similar historical pricing data for the Dow Jones Industrial Average, providing insights into one of the most widely followed stock market indices in the world.

    • Columns:
      • Date: The date of the data point
      • Ticker: The stock symbol (for Dow Jones index, this will be "DJI")
      • Open: The opening price of the index
      • High: The highest value reached on that day
      • Low: The lowest value reached on that day
      • Close: The closing value of the index
      • Adjusted: The adjusted closing value after any corporate actions
      • Returns: Daily percentage return based on close values
      • Volume: The volume of shares traded

    Personal Portfolio Data

    This data is received using a custom framework that fetches real-time and historical stock data from Yahoo Finance. It provides the portfolio’s data based on user-specific stock holdings and performance, allowing for personalized analysis. The personal framework ensures the portfolio data is automatically retrieved and updated with the latest stock prices, returns, and performance metrics.

    This part of the dataset would typically involve data specific to a particular user’s stock positions, weights, and performance, which can be integrated with the other files for portfolio performance analysis.

  8. n

    Data from: Repository Analytics and Metrics Portal (RAMP) 2021 data

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2023). Repository Analytics and Metrics Portal (RAMP) 2021 data [Dataset]. http://doi.org/10.5061/dryad.1rn8pk0tz
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2021. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    The record will be revised periodically to make new data available through the remainder of 2021.

    Methods

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2021-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2021.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2021-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2021.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  9. d

    Data from: iCalendar: Satellite-based Field Map Calendar

    • datasets.ai
    • catalog.data.gov
    0, 22
    Updated May 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Agriculture (2024). iCalendar: Satellite-based Field Map Calendar [Dataset]. https://datasets.ai/datasets/icalendar-satellite-based-field-map-calendar
    Explore at:
    0, 22Available download formats
    Dataset updated
    May 31, 2024
    Dataset authored and provided by
    Department of Agriculture
    Description

    GUI-based software coded in PYTHON to promote throughput image processing and analytics of a big dataset of satellite imagery and provide spatiotemporal monitoring of crop health conditions throughout the growing season by automatically illustrating 1) a field map calendar (FMC) with daily thumbnails of vegetation heatmaps in each month and 2) a seasonal Vegetation Index (VI) Profile of the crop fields. Output examples of FMC and VI Profile are found in files named in fmCalendar.jpg and NDVI_Profile.jpg, respectively, which were created satellite imagery on 5/1-10/31 in 2020 from a sugarbeet field in Moorhead, MN.

  10. n

    Repository Analytics and Metrics Portal (RAMP) 2017 data

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2017 data [Dataset]. http://doi.org/10.5061/dryad.r7sqv9scf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  11. f

    Data from: MassyTools: A High-Throughput Targeted Data Processing Tool for...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bas C. Jansen; Karli R. Reiding; Albert Bondt; Agnes L. Hipgrave Ederveen; Magnus Palmblad; David Falck; Manfred Wuhrer (2023). MassyTools: A High-Throughput Targeted Data Processing Tool for Relative Quantitation and Quality Control Developed for Glycomic and Glycoproteomic MALDI-MS [Dataset]. http://doi.org/10.1021/acs.jproteome.5b00658.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Bas C. Jansen; Karli R. Reiding; Albert Bondt; Agnes L. Hipgrave Ederveen; Magnus Palmblad; David Falck; Manfred Wuhrer
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The study of N-linked glycosylation has long been complicated by a lack of bioinformatics tools. In particular, there is still a lack of fast and robust data processing tools for targeted (relative) quantitation. We have developed modular, high-throughput data processing software, MassyTools, that is capable of calibrating spectra, extracting data, and performing quality control calculations based on a user-defined list of glycan or glycopeptide compositions. Typical examples of output include relative areas after background subtraction, isotopic pattern-based quality scores, spectral quality scores, and signal-to-noise ratios. We demonstrated MassyTools’ performance on MALDI-TOF-MS glycan and glycopeptide data from different samples. MassyTools yielded better calibration than the commercial software flexAnalysis, generally showing 2-fold better ppm errors after internal calibration. Relative quantitation using MassyTools and flexAnalysis gave similar results, yielding a relative standard deviation (RSD) of the main glycan of ∼6%. However, MassyTools yielded 2- to 5-fold lower RSD values for low-abundant analytes than flexAnalysis. Additionally, feature curation based on the computed quality criteria improved the data quality. In conclusion, we show that MassyTools is a robust automated data processing tool for high-throughput, high-performance glycosylation analysis. The package is released under the Apache 2.0 license and is freely available on GitHub (https://github.com/Tarskin/MassyTools).

  12. Pre-processed B-cell receptor amplicon sequencing data from SRR1842411

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adaptive Immunity Group; Adaptive Immunity Group (2020). Pre-processed B-cell receptor amplicon sequencing data from SRR1842411 [Dataset]. http://doi.org/10.5281/zenodo.806864
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adaptive Immunity Group; Adaptive Immunity Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example dataset containing B-cell receptor (BCR) gene sequences. This dataset is intended to be used for testing software tools developed to annotate (i.e. map Variable, Diversity and Joining segments) and perform clonal analysis of BCR sequencing data.

    Sequencing:

    Libraries prepared using 5'RACE from PBMCs of a healthy donor. Input molecules were tagged with unique molecular identifiers (UMIs). Sequencing was ran on MiSeq , 300+300bp reads.

    Contents:

    The dataset contains both raw sequencing reads and high-quality consensus sequences assembled using unique molecular tagging (UMI) approach. Consensus assembly corrects for sequencing errors and eliminates sequencing artifacts.

    • age_ig_s7_R1.fastq.gz and age_ig_s7_R2.fastq.gz contain raw reads
    • age_ig_s7_R1.t10.cf.fastq.gz and age_ig_s7_R2.t10.cf.fastq.gz contain consensus sequences

    All files contain an UMI tag sequence in their header, in form UMI:NNNN:QQQQ where N is the base character and Q is the quality character (for assembled consensuses the total number of reads is given instead of Q string).

    Note that consensus sequences were assembled using only raw sequences that correspond to UMI tags supported by at least 10 sequencing reads. That means that consensus sequence files contain a subset of all UMI tags found in raw sequences. Thus, if one wants to assess software performance on raw sequencing reads using assembled consensus sequences as a high-quality data standard, raw sequencing reads should be filtered to contain only those UMI tags that are present in consensus sequence file.

    Citations:

    The whole dataset was used to benchmark MiXCR software and was originally referenced in Bolotin DA, et al. MiXCR: software for comprehensive adaptive immunity profiling Nature methods 12(5):380-381, 2015.

    Data pre-processing was carried out using MIGEC software, Shugay M et al. Towards error-free profiling of immune repertoires. Nature Methods 11(6):653-655, 2014.

    Contributors:

    The dataset was generated in Prof. Chudakov lab (Adaptive Immunity Group in Masaryk University, Brno and Genomics of Adaptive Immunity Lab in Institute of Bioorganic Chemistry, Moscow). Sample preparation and sequencing was performed by Dr. Olga Britanova and Dr. Maria Turchaninova. Raw sequencing reads were pre-processed and uploaded by Dr. Mikhail Shugay.

  13. n

    Repository Analytics and Metrics Portal (RAMP) 2018 data

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

    Methods

    RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

    Data Collection from August 19, 2018 Onward

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

  14. f

    An example of the Row Table.

    • figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsien-Tsung Chang; Tsai-Huei Lin (2023). An example of the Row Table. [Dataset]. http://doi.org/10.1371/journal.pone.0168935.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hsien-Tsung Chang; Tsai-Huei Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of the Row Table.

  15. m

    Egyptian Stock Exchange (EGX)

    • data.mendeley.com
    Updated Nov 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Essam Houssein (2020). Egyptian Stock Exchange (EGX) [Dataset]. http://doi.org/10.17632/7chdr568x7.1
    Explore at:
    Dataset updated
    Nov 20, 2020
    Authors
    Essam Houssein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study is based on the historical data for some of the indicators on the Egyptian Stock Exchange (EGX), in order to build a prediction model with high accuracy. Data used in this study are purchased from Egypt for Information Dissemination (EGID) which is a Governmental organization that provides data for EGX. The data contain six stock market indices; for example, EGX-30 index local currency is used for interest estimates and denominated in US dollars. It measures top 30 firms in liquidity and activity. The second index used in this study is EGX-30- Capped which is designed to track performance of the most traded companies in accordance with the rules set for mutual funds. The third index is EGX-70 which aims at providing wider tools for investors to monitor market performance. EGX-100 index as a forth dataset evaluates performance of the 100 active firms, including 30 of EGX- 30 index as well as 70 of EGX-70 index. NIlE index avoids concentration on one industry and therefore has a good representation of various industries/sectors in the economy, and the index is weighted by market capitalization and adjusted by free float. The last index is EGX-50-EWI which tracks top 50 companies in terms of liquidity and activity. The index is designed to balance the impact of price changes among the constituents of the index as they will have a fixed weight of 2% at each quarterly review.

  16. UCI SECOM Dataset

    • kaggle.com
    zip
    Updated May 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paresh Mathur (2018). UCI SECOM Dataset [Dataset]. https://www.kaggle.com/paresh2047/uci-semcom
    Explore at:
    zip(2127320 bytes)Available download formats
    Dataset updated
    May 28, 2018
    Authors
    Paresh Mathur
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Manufacturing process feature selection and categorization

    Content

    Abstract: Data from a semi-conductor manufacturing process

    • Data Set Characteristics: Multivariate
    • Number of Instances: 1567
    • Area: Computer
    • Attribute Characteristics: Real
    • Number of Attributes: 591
    • Date Donated: 2008-11-19
    • Associated Tasks: Classification, Causal-Discovery
    • Missing Values? Yes

    A complex modern semi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from sensors and or process measurement points. However, not all of these signals are equally valuable in a specific monitoring system. The measured signals contain a combination of useful information, irrelevant information as well as noise. It is often the case that useful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature selection may be applied to identify the most relevant signals. The Process Engineers may then use these signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increase in process throughput, decreased time to learning and reduce the per unit production costs.

    To enhance current business improvement techniques the application of feature selection as an intelligent systems technique is being investigated.

    The dataset presented in this case represents a selection of such features where each example represents a single production entity with associated measured features and the labels represent a simple pass/fail yield for in house line testing, figure 2, and associated date time stamp. Where .1 corresponds to a pass and 1 corresponds to a fail and the data time stamp is for that specific test point.

    Using feature selection techniques it is desired to rank features according to their impact on the overall yield for the product, causal relationships may also be considered with a view to identifying the key features.

    Results may be submitted in terms of feature relevance for predictability using error rates as our evaluation metrics. It is suggested that cross validation be applied to generate these results. Some baseline results are shown below for basic feature selection techniques using a simple kernel ridge classifier and 10 fold cross validation.

    Baseline Results: Pre-processing objects were applied to the dataset simply to standardize the data and remove the constant features and then a number of different feature selection objects selecting 40 highest ranked features were applied with a simple classifier to achieve some initial results. 10 fold cross validation was used and the balanced error rate (*BER) generated as our initial performance metric to help investigate this dataset.

    SECOM Dataset: 1567 examples 591 features, 104 fails

    FSmethod (40 features) BER % True + % True - % S2N (signal to noise) 34.5 +-2.6 57.8 +-5.3 73.1 +2.1 Ttest 33.7 +-2.1 59.6 +-4.7 73.0 +-1.8 Relief 40.1 +-2.8 48.3 +-5.9 71.6 +-3.2 Pearson 34.1 +-2.0 57.4 +-4.3 74.4 +-4.9 Ftest 33.5 +-2.2 59.1 +-4.8 73.8 +-1.8 Gram Schmidt 35.6 +-2.4 51.2 +-11.8 77.5 +-2.3

    Attribute Information:

    Key facts: Data Structure: The data consists of 2 files the dataset file SECOM consisting of 1567 examples each with 591 features a 1567 x 591 matrix and a labels file containing the classifications and date time stamp for each example.

    As with any real life data situations this data contains null values varying in intensity depending on the individuals features. This needs to be taken into consideration when investigating the data either through pre-processing or within the technique applied.

    The data is represented in a raw text file each line representing an individual example and the features seperated by spaces. The null values are represented by the 'NaN' value as per MatLab.

    Acknowledgements

    Authors: Michael McCann, Adrian Johnston

    Inspiration

    • Semiconductor manufacturing has multi dimensional description of each process. Can we find key performance index by using big data techniques?
  17. Student Performance Data Set

    • kaggle.com
    zip
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    zip(12353 bytes)Available download formats
    Dataset updated
    Mar 27, 2020
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  18. f

    List of all 81 selected documents.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Sebastião Constâncio; Denise Fukumi Tsunoda; Helena de Fátima Nunes Silva; Jocelaine Martins da Silveira; Deborah Ribeiro Carvalho (2023). List of all 81 selected documents. [Dataset]. http://doi.org/10.1371/journal.pone.0281323.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alex Sebastião Constâncio; Denise Fukumi Tsunoda; Helena de Fátima Nunes Silva; Jocelaine Martins da Silveira; Deborah Ribeiro Carvalho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of all 81 selected documents.

  19. r

    Open data: Is Auditory Awareness Negativity Confounded by Performance?

    • demo.researchdata.se
    • researchdata.se
    • +2more
    Updated Jul 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Wiens; Rasmus Eklund (2020). Open data: Is Auditory Awareness Negativity Confounded by Performance? [Dataset]. http://doi.org/10.17045/STHLMUNI.9724280
    Explore at:
    Dataset updated
    Jul 6, 2020
    Dataset provided by
    Stockholm University
    Authors
    Stefan Wiens; Rasmus Eklund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The main file is performance_correction.html in AAN3_analysis_scripts.zip. It contains the results of the main analyses.

    See AAN3_readme_figshare.txt: 1. Title of Dataset:Open data: Is auditory awareness negativity confounded by performance?

    1. Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se

      B. Associate or Co-investigator Contact Information Name: Rasmus Eklund Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/raek2031-1.223133 Email: rasmus.eklund@psychology.su.se

      C. Associate or Co-investigator Contact Information Name: Billy Gerdfeldter Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/bige1544-1.403208 Email: billy.gerdfeldter@psychology.su.se

    2. Date of data collection: Subjects (N = 28) were tested between 2018-12-03 and 2019-01-18.

    3. Geographic location of data collection: Department of Psychology, Stockholm, Sweden

    4. Information about funding sources that supported the collection of the data: Swedish Research Council / Vetenskapsrådet (Grant 2015-01181) Marianne and Marcus Wallenberg (Grant 2019-0102)

    SHARING/ACCESS INFORMATION

    1. Licenses/restrictions placed on the data: CC BY 4.0

    2. Links to publications that cite or use the data: Eklund R., Gerdfeldter B., & Wiens S. (2020). Is auditory awareness negativity confounded by performance? Consciousness and Cognition. https://doi.org/10.1016/j.concog.2020.102954

    The study was preregistered: https://doi.org/10.17605/OSF.IO/W4U7V

    1. Links to other publicly accessible locations of the data: N/A

    2. Links/relationships to ancillary data sets: N/A

    3. Was data derived from another source? No

    4. Recommended citation for this dataset: Eklund R., Gerdfeldter B., & Wiens S. (2020). Open data: Is auditory awareness negativity confounded by performance? Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.9724280

    DATA & FILE OVERVIEW

    File List: The files contain the raw data, scripts, and results of main and supplementary analyses of the electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.

    AAN3_experiment_scripts.zip: contains the Python files to run the experiment

    AAN3_rawdata_EEG.zip: contains raw EEG data files for each subject in .bdf format (generated by Biosemi)

    AAN3_rawdata_log.zip: contains log files of the EEG session (generated by Python)

    AAN3_EEG_scripts.zip: Python-MNE scripts to process and to analyze the EEG data

    AAN3_EEG_source_localization_scripts.zip: Python-MNE files needed for source localization. The template MRI is provided in this zip. The files are obtained from the MNE tutorial (https://mne.tools/stable/auto_tutorials/source-modeling/plot_eeg_no_mri.html?highlight=template). Note that the stc folder is empty. The source time course files are not provided because of their large size. They can quickly be generated from the analysis script. They are needed for the source localization.

    AAN3_analysis_scripts.zip: R scripts to analyze the data. The main file is performance_correction.html. It contains the results of the main analyses.

    AAN3_results.zip: contains summary data files, figures, and tables that are created by Python-MNE and R.

    METHODOLOGICAL INFORMATION

    1. Description of methods used for collection/generation of data: The auditory stimuli were two 100-ms tones (f = 900 Hz and 1400 Hz, 5 ms fade-in and fade-out). The experiment was programmed in Python: https://www.python.org/ and used extra functions from here: https://github.com/stamnosslin/mn The EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and saved in .bdf format. For more information, see linked publication.

    2. Methods for processing the data: We computed event-related potentials and source localization. See linked publication

    3. Instrument- or software-specific information needed to interpret the data: MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html# Rstudio used with R (R Core Team, 2016): https://rstudio.com/products/rstudio/ Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v3

    4. Standards and calibration information, if appropriate: For information, see linked publication.

    5. Environmental/experimental conditions: For information, see linked publication.

    6. Describe any quality-assurance procedures performed on the data: For information, see linked publication.

    7. People involved with sample collection, processing, analysis and/or submission:

    • Data collection: Rasmus Eklund with assistance from Billy Gerdfeldter.
    • Data processing, analysis, and submission: Rasmus Eklund and Stefan Wiens

    DATA-SPECIFIC INFORMATION: All relevant information can be found in the MNE-Python and R scripts (in EEG_scripts and analysis_scripts folders) that process the raw data. For example, we added notes to explain what different variables mean.

    The folder structure needs to be as follows: AAN3 (main folder) --->data --->--->bdf (AAN3_rawdata_EEG) --->--->log (AAN3_rawdata_log) --->--->raw (empty) --->MNE (AAN3_EEG_scripts) --->R (AAN3_analysis_scripts) --->results (AAN3_results) --->source (AAN3_EEG_source_localization_files)

    To run the MNE-Python scripts: Anaconda was used with MNE-Python 0.20 (see installation at https://mne.tools/stable/index.html# ). For Downsample_AAN3, ICA_raw_AAN3, Preprocess_AAN3, Make_inverse_operator_AAN3.py, BehaviorTables_AAN3, and PlotSource, the complete scripts should be run (from anaconda prompt). For Analysis_AAN3, one section at the time should be run (from Spyder).

  20. f

    Comparative experiments of multimodal sentiment analysis models on the...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ji Mingyu; Zhou Jiawei; Wei Ning (2023). Comparative experiments of multimodal sentiment analysis models on the dataset CMU-MOSEI. [Dataset]. http://doi.org/10.1371/journal.pone.0273936.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ji Mingyu; Zhou Jiawei; Wei Ning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparative experiments of multimodal sentiment analysis models on the dataset CMU-MOSEI.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2

SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER)

Explore at:
Dataset updated
Jan 15, 2025
Authors
Purwoko Haryadi Santoso
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

Search
Clear search
Close search
Google apps
Main menu