95 datasets found
  1. h

    Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...

    • heidata.uni-heidelberg.de
    pdf, tsv, txt
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2
    Explore at:
    tsv(197975), tsv(190296), tsv(191831), pdf(640128), tsv(107100), txt(3421), tsv(286102), tsv(106632)Available download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    heiDATA
    Authors
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2

    Description

    In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

  2. u

    Data from: Extracted and Anonymised Qualitative Data on Students' Acceptance...

    • recerca.uoc.edu
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaghelli, Juliana Elisa; Loria-Soriano, Eugenia; González, M. Elena Rodríguez; Bañeres, David; Guerrero-Roldán, Ana Elena; Raffaghelli, Juliana Elisa; Loria-Soriano, Eugenia; González, M. Elena Rodríguez; Bañeres, David; Guerrero-Roldán, Ana Elena (2022). Extracted and Anonymised Qualitative Data on Students' Acceptance of an Early Warning System [Dataset]. https://recerca.uoc.edu/documentos/67321ecaaea56d4af0485d1a
    Explore at:
    Dataset updated
    2022
    Authors
    Raffaghelli, Juliana Elisa; Loria-Soriano, Eugenia; González, M. Elena Rodríguez; Bañeres, David; Guerrero-Roldán, Ana Elena; Raffaghelli, Juliana Elisa; Loria-Soriano, Eugenia; González, M. Elena Rodríguez; Bañeres, David; Guerrero-Roldán, Ana Elena
    Description

    The data published in this record was adopted in the following study: Exploring Higher Education students' experience with AI-powered educational tools: The case of an Early Warning System The study analyses the students' experience of an early warning system developed at a fully online university. The study is based on 21 semi-structured interviews that yielded a corpus of 21,761 words, for which a mixed inductive and deductive codification approach was applied after thematic analysis. We focused on 11 themes, 52 subthemes, and 396 coded segments to perform content analysis. Our findings revealed that the students, primarily senior workers with a high-level academic self-efficacy, had little experience with this type of system and low expectations about it. However, a usage experience triggered interest and meaningful reflections on the mentioned tool. Nevertheless, a comparative analysis between disciplines related to Computer Science and Economics showed higher confidence and expectation about the system and artificial intelligence overall by the first group. These results highlight the relevance of supporting students' further experiences and understanding of artificial intelligence systems in education to accept them and mainly to participate in iterative development processes of such tools to achieve quality, relevance, and fairness. The three records attached as part of the dataset include: 1- The General CodeTree with exemplar coding excerpts in Spanish
    2- Extract of transcriptions in English
    3- Full Report in Spanish as extracted from NVIVO, including the extracted codes for the synthesis (1,2) in blue, and the comments made by the two researchers engaged in the interrater agreement.
    4- General Content Analysis (Spreadsheet ODS)

  3. Anonymized data for statstics course for Summer trainees.

    • figshare.com
    txt
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clifton D. Fuller (2023). Anonymized data for statstics course for Summer trainees. [Dataset]. http://doi.org/10.6084/m9.figshare.23586663.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Clifton D. Fuller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anonymized dataset for MD Anderson Summer Student Stats Course.

  4. 👨‍🎓 Open University Learning Analytics

    • kaggle.com
    zip
    Updated Mar 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 👨‍🎓 Open University Learning Analytics [Dataset]. https://www.kaggle.com/datasets/mexwell/open-university-learning-analytics
    Explore at:
    zip(44198573 bytes)Available download formats
    Dataset updated
    Mar 5, 2024
    Authors
    mexwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset introduces the anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). Presentations of courses start in February and October - they are marked by “B” and “J” respectively. The dataset consists of tables connected using unique identifiers. All tables are stored in the csv format.

    Database schema

    https://analyse.kmi.open.ac.uk/resources/images/model.png" alt="">

    courses.csv File contains the list of all available modules and their presentations. The columns are: - code_module – code name of the module, which serves as the identifier. - code_presentation – code name of the presentation. It consists of the year and “B” for the presentation starting in February and “J” for the presentation starting in October. - length - length of the module-presentation in days.

    The structure of B and J presentations may differ and therefore it is good practice to analyse the B and J presentations separately. Nevertheless, for some presentations the corresponding previous B/J presentation do not exist and therefore the J presentation must be used to inform the B presentation or vice versa. In the dataset this is the case of CCC, EEE and GGG modules.

    assessments.csv This file contains information about assessments in module-presentations. Usually, every presentation has a number of assessments followed by the final exam. CSV contains columns:

    • code_module – identification code of the module, to which the assessment belongs.
    • code_presentation - identification code of the presentation, to which the assessment belongs.
    • id_assessment – identification number of the assessment.
    • assessment_type – type of assessment. Three types of assessments exist: Tutor Marked Assessment (TMA), Computer Marked Assessment (CMA) and Final Exam (Exam).
    • date – information about the final submission date of the assessment calculated as the number of days since the start of the module-presentation. The starting date of the presentation has number 0 (zero).
    • weight - weight of the assessment in %. Typically, Exams are treated separately and have the weight 100%; the sum of all other assessments is 100%. If the information about the final exam date is missing, it is at the end of the last presentation week.

    vle.csv The csv file contains information about the available materials in the VLE. Typically these are html pages, pdf files, etc. Students have access to these materials online and their interactions with the materials are recorded. The vle.csv file contains the following columns:

    • id_site – an identification number of the material.
    • code_module – an identification code for module.
    • code_presentation - the identification code of presentation.
    • activity_type – the role associated with the module material.
    • week_from – the week from which the material is planned to be used.
    • week_to – week until which the material is planned to be used.

    studentInfo.csv This file contains demographic information about the students together with their results. File contains the following columns:

    • code_module – an identification code for a module on which the student is registered.
    • code_presentation - the identification code of the presentation during which the student is registered on the module.
    • id_student – a unique identification number for the student.
    • gender – the student’s gender.
    • region – identifies the geographic region, where the student lived while taking the module-presentation.
    • highest_education – highest student education level on entry to the module presentation.
    • imd_band – specifies the Index of Multiple Depravation band of the place where the student lived during the module-presentation.
    • age_band – band of the student’s age.
    • num_of_prev_attempts – the number times the student has attempted this module.
    • studied_credits – the total number of credits for the modules the student is currently studying.
    • disability – indicates whether the student has declared a disability.
    • final_result – student’s final result in the module-presentation.

    studentRegistration.csv This file contains information about the time when the student registered for the module presentation. For students who unregistered the date of unregistration is also recorded. File contains five columns:

    • code_module – an identification code for a module.
    • code_presentation - the identification code of the presentation.
    • id_student – a unique identification number for the student.
    • date_registration – the date of student’s registration on the module presentation, this is the number of days measured relative to the start of the module-presentation (e.g. the negative value -30 means that the student registered to module presentation 30 days before it started).
    • date_unr...
  5. f

    Linguistic description of the six clusters.

    • figshare.com
    xls
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Fazendeiro; Paula Prata; Maria Eugénia Ferrão (2025). Linguistic description of the six clusters. [Dataset]. http://doi.org/10.1371/journal.pone.0332441.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Paulo Fazendeiro; Paula Prata; Maria Eugénia Ferrão
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This work investigates the trade-off between data anonymization and utility, particularly focusing on the implications for equity-related research in education. Using microdata from the 2019 Brazilian National Student Performance Exam (ENADE), the study applies the (ε, δ)-Differential Privacy model to explore the impact of anonymization on the dataset’s utility for socio-educational equity analysis. By clustering both the original and anonymized datasets, the research evaluates how group categories related to students’ sociodemographic variables, such as gender, race, income, and parental education, are affected by the anonymization process. The results reveal that while anonymization techniques can preserve overall data structure, they can also lead to the suppression or misrepresentation of minority groups, introducing biases that may jeopardise the promotion of educational equity. This finding highlights the importance of involving domain experts in the interpretation of anonymized data, particularly in studies aimed at reducing socio-economic inequalities. The study concludes that careful attention is needed to prevent anonymization efforts from distorting key group categories, which could undermine the validity of data-driven policies aimed at promoting equity.

  6. International Students

    • kaggle.com
    zip
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FATEMA ISLAM MEEM (2024). International Students [Dataset]. https://www.kaggle.com/datasets/fatemaislammeem/international-students
    Explore at:
    zip(6732 bytes)Available download formats
    Dataset updated
    Mar 1, 2024
    Authors
    FATEMA ISLAM MEEM
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Title: Survey of International Students

    Creator: Fatema Islam Meem, Imran Hussain Mahdy

    Subject: International Students; Academic and Social Integration; University of Idaho

    Description: This dataset contains responses from a survey conducted to understand the factors affecting the academic success and social integration of international students at the University of Idaho.

    Contributor: University of Idaho

    Date: February 10, 2024

    Type: Dataset

    Format: CSV

    Source: Goole Form

    Language: English

    Coverage: University of Idaho, [2014-2024]

    Sources: The primary data source was a Google Form survey designed to capture international students' perspectives on their integration into the academic and social fabric of the university. Questions were developed to explore academic challenges, social integration, support systems, and overall satisfaction with their university experience.

    Collection Methodology: The survey was distributed with the assistance of the International Programs Office (IPO) at the University of Idaho to ensure a broad reach among the target demographic. Efforts were made to design the survey questions to be clear, concise, and sensitive to the cultural diversity of the respondents. The collection process faced challenges, particularly in achieving a high response rate. Despite these obstacles, approximately 48 responses were obtained, providing valuable insights into the experiences of international students. The survey data were anonymized to protect respondents' privacy and maintain data integrity.

  7. AI Tools Usage by Indian College Students 2025

    • kaggle.com
    zip
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kshitij Saini (2025). AI Tools Usage by Indian College Students 2025 [Dataset]. https://www.kaggle.com/datasets/kshitijsaini121/ai-tools-usage-by-indian-college-students-2025
    Explore at:
    zip(90645 bytes)Available download formats
    Dataset updated
    Jul 7, 2025
    Authors
    Kshitij Saini
    Description

    AI Tool Usage by Indian College Students 2025

    This unique dataset, collected via a May 2025 survey, captures how 496 Indian college students use AI tools (e.g., ChatGPT, Gemini, Copilot) in academics. It includes 16 attributes like AI tool usage, trust, impact on grades, and internet access, ideal for education analytics and machine learning.

    Columns

    • Student_Name: Anonymized student name.
    • ** College_Name:** College attended.
    • Stream: Academic discipline (e.g., Engineering, Arts).
    • Year_of_Study: Year of study (1–4). -** AI_Tools_Used: **Tools used (e.g., ChatGPT, Gemini).
    • Daily_Usage_Hours: Hours spent daily on AI tools. -** Use_Cases:** Purposes (e.g., Assignments, Exam Prep). -** Trust_in_AI_Tools:** Trust level (1–5). -** Impact_on_Grades:** Grade impact (-3 to +3).
    • Do_Professors_Allow_Use: Professor approval (Yes/No). -** Preferred_AI_Tool:** Preferred tool. -** Awareness_Level: **AI awareness (1–10).
    • Willing_to_Pay_for_Access: Willingness to pay (Yes/No). -** State:** Indian state. -** Device_Used:** Device (e.g., Laptop, Mobile). -** Internet_Access: **Access quality (Poor/Medium/High). ### Use Cases Predict academic performance using AI tool usage. Analyze trust in AI across streams or regions. Cluster students by usage patterns. Study digital divide via Internet_Access. Source: Collected via Google Forms survey in May 2025, ensuring diverse representation across India.
  8. G

    Anonymization Tools for Traffic Data Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Anonymization Tools for Traffic Data Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/anonymization-tools-for-traffic-data-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Anonymization Tools for Traffic Data Market Outlook



    According to our latest research, the global market size for Anonymization Tools for Traffic Data reached USD 1.12 billion in 2024, reflecting robust adoption across various sectors. The market is projected to expand at a CAGR of 14.6% during the forecast period, reaching a value of USD 3.48 billion by 2033. This impressive growth is primarily driven by the increasing need for privacy-compliant data sharing and analysis in smart mobility and urban infrastructure, as well as stringent data protection regulations worldwide.




    The surge in demand for Anonymization Tools for Traffic Data is fundamentally fueled by the exponential growth in data generated by intelligent transportation systems, connected vehicles, and urban mobility solutions. As cities embrace smart technologies to enhance traffic flow, reduce congestion, and improve public safety, the volume of sensitive traffic data collected from various sources such as sensors, cameras, and mobile devices has soared. However, this data often contains personally identifiable information (PII), raising significant privacy concerns. The implementation of robust anonymization tools has become a necessity for organizations to comply with regulations like GDPR, CCPA, and other regional data protection laws. These tools ensure that sensitive information is effectively masked or de-identified, enabling data-driven insights without compromising individual privacy, which in turn fuels market growth.




    Another critical growth factor is the increasing collaboration between public and private entities to foster innovation in mobility analytics and urban planning. Governments, transportation authorities, and research organizations are leveraging anonymized traffic data to develop predictive models, optimize public transit routes, and design safer urban environments. The ability to securely share and analyze large volumes of traffic data without exposing personal information is central to these initiatives. Furthermore, advancements in artificial intelligence and machine learning have enhanced the capabilities of anonymization tools, allowing for more sophisticated data transformation techniques that maintain data utility while ensuring compliance. This technological evolution is propelling the adoption of anonymization solutions across diverse end-user segments.




    The proliferation of smart city projects and the integration of Internet of Things (IoT) devices in transportation infrastructure are also significant drivers for the Anonymization Tools for Traffic Data Market. As urban centers worldwide invest in real-time traffic monitoring, autonomous vehicles, and multimodal mobility platforms, the complexity and sensitivity of traffic data continue to increase. Anonymization tools have become indispensable in enabling secure data exchange among stakeholders, facilitating cross-sector collaboration, and supporting data monetization strategies. Additionally, growing public awareness around digital privacy and the reputational risks associated with data breaches are prompting organizations to prioritize data anonymization as a core component of their digital strategy.



    The advent of the Vehicle Data Anonymization Platform is revolutionizing how sensitive vehicle information is managed and utilized in the transportation sector. As connected vehicles become more prevalent, the data they generate is invaluable for enhancing traffic management, improving safety, and optimizing vehicle performance. However, this data often includes personal information that must be protected to comply with privacy regulations. A Vehicle Data Anonymization Platform provides a robust solution by ensuring that data is anonymized before it is shared or analyzed, thus preserving privacy while still allowing for valuable insights to be derived. This platform is crucial for enabling secure data exchange between automotive manufacturers, service providers, and urban planners, fostering innovation and collaboration across the mobility ecosystem.




    From a regional perspective, North America currently leads the Anonymization Tools for Traffic Data Market, accounting for the largest share in 2024. This dominance is attributed to early adoption of advanced traffic management systems, a mature regulatory landscape, and significant investments in smart

  9. Z

    Student activity data

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    • +2more
    Updated Jul 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erkan Er (2021). Student activity data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4331704
    Explore at:
    Dataset updated
    Jul 2, 2021
    Dataset provided by
    Universidad de Valladolid
    Authors
    Erkan Er
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes anonymized records of student activities in a peer review activity.

  10. D

    Wi-Fi Probe Data Anonymization Services Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Wi-Fi Probe Data Anonymization Services Market Research Report 2033 [Dataset]. https://dataintelo.com/report/wi-fi-probe-data-anonymization-services-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Wi-Fi Probe Data Anonymization Services Market Outlook



    According to our latest research, the global Wi-Fi Probe Data Anonymization Services market size reached USD 1.43 billion in 2024, reflecting robust demand driven by heightened privacy regulations and the exponential growth of smart infrastructure. The market is expected to expand at a CAGR of 17.2% from 2025 to 2033, with the forecasted market size reaching USD 5.17 billion by 2033. This growth is primarily fueled by increasing concerns regarding data privacy, the proliferation of Wi-Fi-enabled devices, and the necessity for organizations to comply with global data protection standards.




    One of the most significant growth factors for the Wi-Fi Probe Data Anonymization Services market is the intensifying regulatory landscape surrounding data privacy. With the enforcement of stringent data protection frameworks such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA), and similar regulations emerging in Asia and Latin America, organizations are under immense pressure to ensure that data collected from Wi-Fi probes is anonymized and compliant. These regulations mandate that personally identifiable information (PII) is protected, creating a strong incentive for enterprises to invest in advanced anonymization services. Additionally, the increasing frequency and sophistication of cyber threats have underscored the importance of robust data anonymization protocols, further propelling market demand.




    Another critical driver is the rapid expansion of smart city initiatives and the widespread adoption of IoT devices. Urban centers across the globe are leveraging Wi-Fi probe technology to collect valuable data on foot traffic, mobility patterns, and public safety. However, the collection of such granular data raises significant privacy concerns, necessitating the deployment of effective anonymization solutions. The ability of Wi-Fi probe data anonymization services to mask or encrypt sensitive information without compromising the utility of the data is a key value proposition, enabling municipalities and private enterprises to harness actionable insights while maintaining compliance with privacy laws. This trend is expected to accelerate as cities continue to digitize their infrastructure and prioritize citizen privacy.




    The growing integration of Wi-Fi analytics in sectors such as retail, hospitality, and transportation is also contributing to market expansion. Retailers, for instance, use Wi-Fi probe data to analyze customer behavior, optimize store layouts, and enhance personalized marketing. Similarly, transportation hubs rely on this data to manage passenger flow and improve operational efficiency. However, as consumers become increasingly aware of how their data is used, there is mounting pressure on businesses to employ anonymization services that safeguard user identities. The competitive advantage gained by maintaining consumer trust and adhering to best practices in data privacy is becoming a decisive factor for organizations across these verticals.




    Regionally, North America remains at the forefront of the Wi-Fi Probe Data Anonymization Services market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. The dominance of North America can be attributed to a combination of advanced technological infrastructure, early adoption of privacy-centric solutions, and a mature regulatory environment. Meanwhile, the Asia Pacific region is anticipated to witness the fastest growth over the forecast period, driven by rapid urbanization, expanding digital economies, and increasing investments in smart city projects. Latin America and the Middle East & Africa are also expected to demonstrate steady growth as regulatory frameworks strengthen and digital transformation initiatives gain momentum.



    Service Type Analysis



    The service type segment of the Wi-Fi Probe Data Anonymization Services market encompasses data masking, data tokenization, data encryption, and other specialized anonymization techniques. Data masking remains a foundational approach, enabling organizations to obfuscate sensitive information such as MAC addresses and device identifiers while preserving the analytical value of the data. This technique is widely adopted in industries where real-time analytics are crucial, such as retail and transportation. The demand for data masking is expec

  11. AI in STEM Education

    • kaggle.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bala (2025). AI in STEM Education [Dataset]. https://www.kaggle.com/datasets/balams81/ai-in-stem-education
    Explore at:
    zip(43483 bytes)Available download formats
    Dataset updated
    May 27, 2025
    Authors
    bala
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains anonymized student data from a research study exploring the effects of different educational interventions on students in STEM (Science, Technology, Engineering, and Mathematics) courses across various universities. The study involved different conditions: an AI-tool-only intervention, a mindset-training-only intervention, a combined AI and mindset intervention, and a control group.

    The data encompasses a rich set of variables, including:

    Demographics: Age, gender, ethnicity. Academic Background: University, course type (e.g., Computer Science, Mathematics, Biology, Physics, Engineering), and ACT scores. Psychological Measures: Baseline, mid-intervention, and final measures of intrinsic motivation, STEM anxiety, classroom comfort, and growth beliefs. Intervention-Specific Perceptions: For relevant groups, perceptions of AI usefulness and ease of use. Perceived Support: Perceptions of instructor universal beliefs, peer support, and faculty support. Academic Outcomes: Course grade, final exam score, assignment average, class attendance rate, and study hours per week. This dataset is valuable for researchers and data scientists interested in educational psychology, learning analytics, the impact of AI in education, mindset interventions, and factors influencing student success in STEM fields. It allows for the exploration of how different interventions might affect student motivation, anxiety, beliefs, and academic performance over time.

  12. High School Students Performance Dataset

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed1970 (2025). High School Students Performance Dataset [Dataset]. https://www.kaggle.com/datasets/mohamed1970/high-school-students-performance-dataset
    Explore at:
    zip(84660 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    Mohamed1970
    Description

    ✍️ Detailed Description Content

    Comprehensive High School Student Performance Dataset (2,000 Records)

    This rich dataset contains 2,000 anonymized student records designed for in-depth analysis of academic success factors. It includes essential demographic information and performance scores across 13 core high school subjects.

    Key Data Points:

    Demographics: Includes Gender, Date of Birth, Grade, and Class.

    Academic Performance: Detailed scores (out of 20 points each) for subjects ranging from Algebra II to Pre-Calculus and Computer Science.

    Outcome Metrics: Includes Total Score and the final Result (Pass/Fail).

    **Goal/Inspiration: **The dataset is intended to be a strong benchmark for machine learning projects focused on:

    Predicting a student's final Result based on their demographic profile and initial subject scores.

    Identifying which subjects are the strongest predictors of overall academic success.

    Analyzing performance trends across different Grades and Classes.

  13. r

    Student Reflections on Academic Integrity and the Use of ChatGPT in Swedish...

    • researchdata.se
    • demo.researchdata.se
    • +1more
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christophe Premat; Alexandra Farazouli (2025). Student Reflections on Academic Integrity and the Use of ChatGPT in Swedish Higher Education. Data from a specific survey made in September 2024 [Dataset]. http://doi.org/10.17045/STHLMUNI.30040633
    Explore at:
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    Stockholm University
    Authors
    Christophe Premat; Alexandra Farazouli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains anonymized written reflections from 42 students enrolled in a course on academic integrity at Stockholm University. The data were originally collected on 6 September 2024 during a seminar on ethical aspects of using generative AI in academic work. Students were asked to respond to the question: “Should I mention that I used ChatGPT to complete an academic assignment?”

    The reflections, totaling 2,626 words, were first written in Swedish and subsequently translated into English for research purposes. The dataset includes:

    • The original student responses in Swedish.
    • The English translations of the responses.
    • Contextual information about the teaching activity (including prior exposure to plagiarism-prevention resources and a self-study course on academic integrity). The data are fully anonymized in compliance with GDPR and cannot be linked back to individual students. They were used in the article: Christophe Premat & Alexandra Farazouli (2025). “Academic Integrity vs. Artificial Intelligence: a tale of two AIs,” Práxis Educativa, v. 20, e24871. https://doi.org/10.5212/PraxEduc.v.20.24871.016 (https://doi.org/10.5212/PraxEduc.v.20.24871.016?utm_source=chatgpt.com)
  14. Z

    Data from: Problem Solving and Algorithmic Development with Flowcharts

    • data.niaid.nih.gov
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smetsers-Weeda, Renske; Smetsers, Sjaak (2024). Problem Solving and Algorithmic Development with Flowcharts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8134150
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Radboud University Nijmegen
    Ra
    Authors
    Smetsers-Weeda, Renske; Smetsers, Sjaak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study reports on an in-depth research into student-learning using a "thinking-first" framework combined with stepwise heuristics, to provide students structure throughout the entire programming process.

    The study targetted secondary education students in an elective computer science course. There was one class with 11 Dutch high school students, of which 2 females and 9 males. The group was heterogeneous, with students from different academic levels and age-groups. Each student’s level and previous experience with CS was determined a priori using a pretest.

    For this study we developed sets of quizes, tasks and tests comprised of code comprehension, code composition questions (including reading and creating flowchart designs). The student responses to each were anaylzed.

    This repository contains the following data: - taxonomyPerQ.pdf: indicates taxonomy level of each (quiz, task, test) question answered by students - assessments_unanswered: all questions (quizes, tasks, tests) administered to students - pretask (responses): anonymized student responses to pretask questions - midtask (responses): anonymized student responses to midtask questions - finaltask (responses): anonymized student responses to finaltask questions - quiz 1 (responses): anonymized student responses to quiz 1 questions - quiz 2 (responses): anonymized student responses to quiz 2 questions - final test (responses): anonymized student responses to final test questions

    The student's handwritten work was scanned, saved as pdf and coded in atlas.ti. These coded pdf's cannot be anonymized anymore, and thus not openly distributed or published.

  15. H

    E4 Data: Students: Processed Quantitative Demographic Dataset

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cathy Lachapelle (2020). E4 Data: Students: Processed Quantitative Demographic Dataset [Dataset]. http://doi.org/10.7910/DVN/FS5YD5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Cathy Lachapelle
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes all anonymized quantitative student-level demographic data, and a codebook explaining the variables contained in the dataset.

  16. University Academic Misconduct Detector Dataset

    • kaggle.com
    zip
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimpi Mittal (2025). University Academic Misconduct Detector Dataset [Dataset]. https://www.kaggle.com/datasets/dimpimittal/university-academic-misconduct-detector-dataset
    Explore at:
    zip(4557 bytes)Available download formats
    Dataset updated
    Aug 13, 2025
    Authors
    Dimpi Mittal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The University Academic Misconduct Detector Dataset is designed to support research and development of AI-powered tools for identifying potential academic dishonesty in higher education institutions. With the rise of online and hybrid learning environments, universities face growing challenges in detecting plagiarism, contract cheating, and other forms of misconduct. This dataset provides a clean, well-structured collection of academic records, behavior logs, and submission patterns to enable data-driven solutions for academic integrity.

    Purpose & Use Cases

    The dataset aims to help:

    Researchers develop advanced machine learning models for detecting anomalies in student work.

    Educators identify early warning signs of potential misconduct and provide timely interventions.

    Universities create automated monitoring systems to flag suspicious activity in large cohorts.

    Data Scientists explore classification, clustering, and anomaly detection techniques in an educational setting.

    Dataset Composition

    The dataset contains the following key components:

    Student Metadata: Basic anonymized demographic details (e.g., age group, department, study year).

    Assignment & Exam Submissions: Scores, submission times, and similarity index from plagiarism detection tools.

    Behavioral Indicators: Login frequency, resource usage, and last-minute submission patterns.

    Misconduct Labels: Ground truth annotations indicating whether a case was flagged as misconduct or not.

    All data has been anonymized to ensure privacy and compliance with ethical guidelines.

    Applications in Machine Learning

    Binary Classification: Predict whether a student’s submission involves misconduct.

    Anomaly Detection: Spot unusual submission patterns compared to the class average.

    Feature Engineering Practice: Extract meaningful features from behavioral logs for predictive modeling.

    Explainable AI Research: Explore interpretable models for sensitive decision-making in education.

    Why This Dataset Matters

    Academic misconduct is a growing global concern that threatens the credibility of higher education. By providing this dataset, the goal is to foster transparent, ethical, and AI-assisted solutions that help maintain fairness and academic excellence.

    Ethical Considerations

    Data is synthetic/anonymized and does not represent real individuals.

    Intended for educational and research purposes only.

    Any deployment of models trained on this dataset should follow strict institutional ethics policies.

  17. o

    Data from: ComEd's anonymized AMI energy usage data

    • openenergyhub.ornl.gov
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ComEd's anonymized AMI energy usage data [Dataset]. https://openenergyhub.ornl.gov/explore/dataset/comed-s-anonymized-ami-energy-usage-data/
    Explore at:
    Dataset updated
    Jul 30, 2024
    Description

    One of the key impacts of AMI technology is the availability of interval energy usage data, which can support the development of new products and services and to enable the market to deliver greater value to customers. Requestors can now access anonymized interval energy usage data in 30 minute intervals for all zip codes where AMI meters have been deployed.

  18. A Hybrid Educational Dataset

    • kaggle.com
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emanoel Carvalho Lopes (2025). A Hybrid Educational Dataset [Dataset]. https://www.kaggle.com/datasets/emanoelcarvalholopes/uci-oulad-sintetico-unificados
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emanoel Carvalho Lopes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    The early identification of students facing learning difficulties is one of the most critical challenges in modern education. Intervening effectively requires leveraging data to understand the complex interplay between student demographics, engagement patterns, and academic performance.

    This dataset was created to serve as a high-quality, pre-processed resource for building machine learning models to tackle this very problem. It is a unique hybrid dataset, meticulously crafted by unifying three distinct sources:

    The Open University Learning Analytics Dataset (OULAD): A rich dataset detailing student interactions with a Virtual Learning Environment (VLE). We have aggregated the raw, granular data (over 10 million interaction logs) into powerful features, such as total clicks, average assessment scores, and distinct days of activity for each student registration.

    The UCI Student Performance Dataset: A classic educational dataset containing demographic information and final grades in Portuguese and Math subjects from two Portuguese schools.

    A Synthetic Data Component: A synthetically generated portion of the data, created to balance the dataset or represent specific student profiles.

    Data Unification and Pre-processing

    A direct merge of these sources was not possible as the student identifiers were not shared. Instead, a strategy of intelligent concatenation was employed. The final dataset has undergone a rigorous pre-processing pipeline to make it immediately usable for machine learning tasks:

    • Advanced Imputation: Missing values were handled using a sophisticated iterative imputation method powered by Gaussian Mixture Models (GMM), ensuring the dataset's integrity.

    • One-Hot Encoding: All categorical features have been converted to a numerical format.

    • Feature Scaling: All numerical features have been standardized (using StandardScaler) to have a mean of 0 and a standard deviation of 1, preventing model bias from features with different scales.

    The result is a clean, comprehensive dataset ready for modeling.

    File Information

    Instance

    Each row represents a student profile, and the columns are the features and the target.

    Feature

    Features include aggregated online engagement metrics (e.g., clicks, distinct activities), academic performance (grades, scores), and student demographics (e.g., gender, age band). A key feature indicates the original data source (OULAD, UCI, Synthetic).

    Sensitive Information

    The dataset contains no Personally Identifiable Information (PII). Demographic information is presented in broad, anonymized categories.

    Key Columns:

    Target Variable:
    
      had_difficulty: The primary target for classification. This binary variable has been engineered from the original final_result column of the OULAD dataset.
    
        1: The student either failed (Fail) or withdrew (Withdrawn) from the course.
    
        0: The student passed (Pass or Distinction).
    
    Feature Groups:
    
      OULAD Aggregated Features (e.g., oulad_total_cliques, oulad_media_notas): Quantitative metrics summarizing a student's engagement and performance within the VLE.
    
      Academic Performance Features (e.g., nota_matematica_harmonizada): Harmonized grades from different data sources.
    
      Demographic Features (e.g., gender_*, age_band_*): One-hot encoded columns representing student demographics.
    
      Origin Features (e.g., origem_dado_OULAD, origem_dado_UCI): One-hot encoded columns indicating the original source of the data for each row. This allows for source-specific analysis.
    

    (Note: All numerical feature names are post-scaling and may not directly reflect their original names. Please refer to the complete column list for details.)

    Acknowledgements

    This dataset would not be possible without the original data providers. Please acknowledge them in any work that uses this data:

    OULAD Dataset: Kuzilek, J., Hlosta, M., and Zdrahal, Z. (2017). Open University Learning Analytics dataset. Scientific Data, 4. https://analyse.kmi.open.ac.uk/open_dataset
    
    UCI Student Performance Dataset: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS. https://archive.ics.uci.edu/ml/datasets/student+performance
    

    Inspiration

    This dataset is perfect for a variety of predictive modeling tasks. Here are a few ideas to get you started:

    Can you build a classification model to predict had_difficulty with high recall? (Minimizing the number of at-risk students we fail to identify).
    
    • Which features are the most powerful predictors of student failure or withdrawal? (Feature Importance Analysis).

    • Can you build separate models for each data origin (origem_dado_*) and compare ...

  19. h

    text-anonymization-benchmark-train

    • huggingface.co
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateusz Dziemian (2025). text-anonymization-benchmark-train [Dataset]. https://huggingface.co/datasets/mattmdjaga/text-anonymization-benchmark-train
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2025
    Authors
    Mateusz Dziemian
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset card for Text Anonymization Benchmark (TAB) train

      Dataset Summary
    

    This is the training split of the Text Anonymisation Benchmark. As the title says it's a dataset focused on text anonymisation, specifcially European Court Documents, which contain labels by mutltiple annotators.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/mattmdjaga/text-anonymization-benchmark-train.

  20. n

    Cross-sectional survey of students in the North East of England raw data...

    • figshare.northumbria.ac.uk
    xlsx
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katie Haighton (2024). Cross-sectional survey of students in the North East of England raw data anonymised.xlsx [Dataset]. http://doi.org/10.25398/rd.northumbria.25783284.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 9, 2024
    Dataset provided by
    Northumbria University
    Authors
    Katie Haighton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A cross-sectional, mixed methods online survey was deployed to students in five universities in the NE. The survey explored whether students changed behaviour at university, what they changed and why they changed. Their engagement in physical activity, smoking, diet (consumption of fruit and vegetables) and alcohol consumption was assessed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2

Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data]

Related Article
Explore at:
tsv(197975), tsv(190296), tsv(191831), pdf(640128), tsv(107100), txt(3421), tsv(286102), tsv(106632)Available download formats
Dataset updated
Nov 20, 2024
Dataset provided by
heiDATA
Authors
Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
License

https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2

Description

In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

Search
Clear search
Close search
Google apps
Main menu