25 datasets found
  1. SQL Databases for Students and Educators

    • zenodo.org
    bin, html
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985
    Explore at:
    bin, htmlAvailable download formats
    Dataset updated
    Oct 28, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

    I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

    Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

    Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.

  2. P

    WikiSQL Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Zhong; Caiming Xiong; Richard Socher, WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
    Explore at:
    Authors
    Victor Zhong; Caiming Xiong; Richard Socher
    Description

    WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

  3. D

    Database Management Software Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Database Management Software Report [Dataset]. https://www.marketresearchforecast.com/reports/database-management-software-40762
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Database Management Software (DBMS) market is experiencing robust growth, projected to reach $1453.9 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 10.2% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of cloud-based solutions offers scalability, cost-effectiveness, and enhanced accessibility, driving significant market share for cloud-based DBMS offerings. Furthermore, the burgeoning volume of data generated across various sectors, particularly in large enterprises and SMEs, necessitates robust and efficient database management systems. The demand for advanced analytics and real-time data processing is further propelling market growth. While the market faces challenges such as data security concerns and the need for skilled professionals to manage complex DBMS systems, the overall outlook remains positive. The market segmentation reveals a strong preference for cloud-based solutions across both large enterprises and SMEs. North America currently holds a significant market share due to early adoption and technological advancements, but the Asia-Pacific region is poised for rapid growth given its expanding digital economy and increasing investment in data infrastructure. Competition among established players like IBM, Oracle, and Microsoft, alongside emerging players offering specialized solutions, ensures a dynamic and innovative market landscape. The forecast period (2025-2033) anticipates continued growth driven by several factors. Technological advancements, such as the development of NoSQL databases and in-memory databases, will cater to the evolving data management needs of businesses. The increasing integration of artificial intelligence (AI) and machine learning (ML) into DBMS solutions will enhance functionalities such as data analysis and predictive modelling, further boosting market demand. Geographic expansion into developing economies, fueled by digital transformation initiatives, will also contribute to market expansion. However, maintaining robust data security practices and addressing the skills gap in DBMS management will remain crucial for sustained growth. The competitive landscape will continue to evolve with mergers, acquisitions, and technological innovations driving the market's trajectory over the coming years.

  4. f

    Additional file 1: of Examining database persistence of ISO/EN 13606...

    • springernature.figshare.com
    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo SĂĄnchez-de-Madariaga; Adolfo MuĂąoz; Raimundo Lozano-RubĂ; Pablo Serrano-Balazote; Antonio Castro; Oscar Moreno; Mario Pascual (2023). Additional file 1: of Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches [Dataset]. http://doi.org/10.6084/m9.figshare.c.3858004_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Ricardo SĂĄnchez-de-Madariaga; Adolfo MuĂąoz; Raimundo Lozano-RubĂ­; Pablo Serrano-Balazote; Antonio Castro; Oscar Moreno; Mario Pascual
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SQL program. Program written in SQL performing the six queries on the MySQL database. (SQL 15.3 kb)

  5. Definitions of incidence and prevalence terms.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis (2023). Definitions of incidence and prevalence terms. [Dataset]. http://doi.org/10.1371/journal.pone.0171784.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definitions of incidence and prevalence terms.

  6. Purchase Order Data

    • data.ca.gov
    • catalog.data.gov
    csv, docx, pdf
    Updated Oct 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of General Services (2019). Purchase Order Data [Dataset]. https://data.ca.gov/dataset/purchase-order-data
    Explore at:
    docx, csv, pdfAvailable download formats
    Dataset updated
    Oct 23, 2019
    Dataset authored and provided by
    California Department of General Services
    Description

    The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015

    Data Limitations:
    Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.

    Data Collection Methodology:

    The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.

    Secondary/Related Resources:

  7. Hospital Management Dataset

    • kaggle.com
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanak Baghel (2025). Hospital Management Dataset [Dataset]. https://www.kaggle.com/datasets/kanakbaghel/hospital-management-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kanak Baghel
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a structured, multi-table dataset designed to simulate a hospital management system. It is ideal for practicing data analysis, SQL, machine learning, and healthcare analytics.

    Dataset Overview

    This dataset includes five CSV files:

    1. patients.csv – Patient demographics, contact details, registration info, and insurance data

    2. doctors.csv – Doctor profiles with specializations, experience, and contact information

    3. appointments.csv – Appointment dates, times, visit reasons, and statuses

    4. treatments.csv – Treatment types, descriptions, dates, and associated costs

    5. billing.csv – Billing amounts, payment methods, and status linked to treatments

    📁 Files & Column Descriptions

    ** patients.csv**

    Contains patient demographic and registration details.

    Column Description

    patient_id -> Unique ID for each patient first_name -> Patient's first name last_name -> Patient's last name gender -> Gender (M/F) date_of_birth -> Date of birth contact_number -> Phone number address -> Address of the patient registration_date -> Date of first registration at the hospital insurance_provider -> Insurance company name insurance_number -> Policy number email -> Email address

    ** doctors.csv**

    Details about the doctors working in the hospital.

    Column Description

    doctor_id -> Unique ID for each doctor first_name -> Doctor's first name last_name -> Doctor's last name specialization -> Medical field of expertise phone_number -> Contact number years_experience -> Total years of experience hospital_branch -> Branch of hospital where doctor is based email -> Official email address

    appointments.csv

    Records of scheduled and completed patient appointments.

    Column Description

    appointment_id -> Unique appointment ID patient_id -> ID of the patient doctor_id -> ID of the attending doctor appointment_date -> Date of the appointment appointment_time -> Time of the appointment reason_for_visit -> Purpose of visit (e.g., checkup) status -> Status (Scheduled, Completed, Cancelled)

    treatments.csv

    Information about the treatments given during appointments.

    Column Description

    treatment_id -> Unique ID for each treatment appointment_id -> Associated appointment ID treatment_type -> Type of treatment (e.g., MRI, X-ray) description -> Notes or procedure details cost -> Cost of treatment treatment_date -> Date when treatment was given

    ** billing.csv**

    Billing and payment details for treatments.

    Column Description

    bill_id -> Unique billing ID patient_id -> ID of the billed patient treatment_id -> ID of the related treatment bill_date -> Date of billing amount -> Total amount billed payment_method -> Mode of payment (Cash, Card, Insurance) payment_status -> Status of payment (Paid, Pending, Failed)

    Possible Use Cases

    SQL queries and relational database design

    Exploratory data analysis (EDA) and dashboarding

    Machine learning projects (e.g., cost prediction, no-show analysis)

    Feature engineering and data cleaning practice

    End-to-end healthcare analytics workflows

    Recommended Tools & Resources

    SQL (joins, filters, window functions)

    Pandas and Matplotlib/Seaborn for EDA

    Scikit-learn for ML models

    Pandas Profiling for automated EDA

    Plotly for interactive visualizations

    Please Note that :

    All data is synthetically generated for educational and project use. No real patient information is included.

    If you find this dataset helpful, consider upvoting or sharing your insights by creating a Kaggle notebook.

  8. Available functions in rEHR.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis (2023). Available functions in rEHR. [Dataset]. http://doi.org/10.1371/journal.pone.0171784.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Available functions in rEHR.

  9. Opening the Valve on Pure Data Dataset

    • zenodo.org
    application/gzip, zip
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle (2024). Opening the Valve on Pure Data Dataset [Dataset]. http://doi.org/10.5281/zenodo.10576757
    Explore at:
    zip, application/gzipAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page contains the i) SQLite database, and ii) scripts and instructions for the paper titled Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language.

    We have provided two main files in this link:

    1. dataset.tar.gz
    2. scripts_and_instructions.zip

    Additionally, the i) SQLite database, ii) scripts and instructions, and iii) mirrored repositories of the PD projects can also be found in the following link: https://archive.org/details/Opening_the_Valve_on_Pure_Data.

    The download instructions are as follows:

    1. Our dataset is available at this link and also at archive.org and at as a file titled dataset.tar.gz (~1.12GB). You can download the file and then you can unzip the database by running tar -xzf dataset.tar.gz.
    2. You can also find the scripts and instructions needed to use our database and replicate our work inside the scripts_and_instructions.zip (~116MB) file, which you can download from this link and also from the same archive.org link. After that, you can unzip the scripts_and_instructions.zip file by using the command: unzip scripts_and_instructions.zip.
    3. Finally, the mirrored PD repositories are available at archive.org. The file is titled pd_mirrored.tar.gz (~242.5GB). You can download the zipped folder of the mirrored repositories using the following command: wget -c https://archive.org/download/Opening_the_Valve_on_Pure_Data/pd_mirrored.tar.gz. After that, you can unzip the file using tar -xzf pd_mirrored.tar.gz.

    You can find a README.md file inside the unzipped directory titled scripts_and_instructions detailing the structure and usage of our dataset, along with some sample SQL queries and additional helper scripts for the database. Furthermore, we have provided instructions for replicating our work in the same README file.

  10. D

    Database Monitoring Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Database Monitoring Software Report [Dataset]. https://www.datainsightsmarket.com/reports/database-monitoring-software-1973363
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Database Monitoring Software market is experiencing robust growth, driven by the increasing adoption of cloud-based databases, the rise of big data analytics, and the growing need for enhanced application performance and availability. The market, estimated at $5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated $15 billion by 2033. This expansion is fueled by several key factors: the complexity of modern database environments requiring sophisticated monitoring tools, the stringent regulatory compliance mandates pushing for improved data security and reliability, and the burgeoning adoption of DevOps practices that necessitate real-time database insights. Key trends shaping this market include the integration of AI and machine learning for predictive analytics and automated alerts, the growing demand for multi-cloud database monitoring solutions, and the increasing focus on observability to proactively identify and resolve performance bottlenecks. Despite this positive outlook, challenges remain, such as the rising cost of implementation and integration, the need for skilled professionals to manage these complex systems, and the potential for vendor lock-in with proprietary solutions. The competitive landscape is marked by a diverse range of vendors, including established players like Datadog, SolarWinds, and Micro Focus, alongside niche providers catering to specific database technologies or industry verticals. The market is witnessing increased consolidation as larger players acquire smaller firms to expand their product portfolios and market reach. To maintain a competitive edge, vendors are focusing on innovation, offering comprehensive features such as performance monitoring, security auditing, and capacity planning, along with enhanced user interfaces and seamless integration with existing IT infrastructure. The geographic distribution is expected to be fairly broad, with North America and Europe holding significant market share initially, followed by a steady rise in adoption across Asia-Pacific and other regions driven by digital transformation initiatives in developing economies.

  11. US Business Dataset - Multiple Categories

    • kaggle.com
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). US Business Dataset - Multiple Categories [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/us-business-dataset-multiple-categories/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jeffery Mandrake
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Business dataset. Phone numbers, addresses and emails have been removed. This data came from an old database (over 10 years). Use as a practice dataset for Pandas, Pyspark or SQL. This dataset contains 784,156 records.

  12. D

    Data Modeling Tool Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Modeling Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/data-modeling-tool-542143
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data modeling tool market is experiencing robust growth, driven by the increasing demand for efficient data management and the rise of big data analytics. The market, estimated at $5 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. This expansion is fueled by several key factors, including the growing adoption of cloud-based data modeling solutions, the increasing need for data governance and compliance, and the expanding use of data visualization and business intelligence tools that rely on well-structured data models. The market is segmented by tool type (e.g., ER diagramming tools, UML modeling tools), deployment mode (cloud, on-premise), and industry vertical (e.g., BFSI, healthcare, retail). Competition is intense, with established players like IBM, Oracle, and SAP vying for market share alongside numerous specialized vendors offering niche solutions. The market's growth is being further accelerated by the adoption of agile methodologies and DevOps practices that necessitate faster and more iterative data modeling processes. The major restraints impacting market growth include the high cost of advanced data modeling software, the complexity associated with implementing and maintaining these solutions, and the lack of skilled professionals adept at data modeling techniques. The increasing availability of open-source tools, coupled with the growth of professional training programs focused on data modeling, are gradually alleviating this constraint. Future growth will likely be shaped by innovations in artificial intelligence (AI) and machine learning (ML) that are being integrated into data modeling tools to automate aspects of model creation and validation. The trend towards data mesh architecture and the growing importance of data literacy are also driving demand for user-friendly and accessible data modeling tools. Furthermore, the development of integrated platforms that combine data modeling with other data management functions is a key market trend that is likely to significantly impact future growth.

  13. Analysis code & Data for the combined Cogcarsim studies 2017+2019

    • figshare.com
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analysis code & Data for the combined Cogcarsim studies 2017+2019 [Dataset]. https://figshare.com/articles/dataset/Data_for_the_combined_Cogcarsim_studies_2017_2019/13567409
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Benjamin Cowley
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CODE--------R markdown script 'cogcarsim_analyses.Rmd' will recompute the analyses from Palomäki et al 2021, “The Link Between Flow and Performance is Moderated by Task Experience”. Precompiled HTML output of this script is also provided.To run the script, download all contents of this Figshare object, load cogcarsim_analyses.Rmd in RStudio and knit (press Ctrl+Shift+k on Linux).Note also that to export figures, uncomment the corresponding lines of code (e.g. line 116: #ggsave(“figure4.pdf”, width=12, height=6)DATA-------SQL databases cogcarsim2_2017.db & cogcarsim2_2019.db contain the CogCarSim log data of 18 subjects, 9 from 2017 and 9 from 2019.background_2017.csv & background_2019.csv contain original profile data on 18 subjects. background_cogcarsim_2017.csv & background_cogcarsim_2019.csv contain cleaned-up, mutually compatible profile data on 18 subjects.fss_data_2017.csv & fss_data_2019.csv contain Flow Short Scale self-report data on 18 subjects. fss_learning.csv combines them and adds variables on learning derived from models fitted to data from the SQL database files. This file is generated by the accompanying R code cogcarsim_analyses.R

  14. Analytic Engineer

    • kaggle.com
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NxtWave Data Engineers (2022). Analytic Engineer [Dataset]. https://www.kaggle.com/datasets/nxtwavedataengineers/data-engineer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NxtWave Data Engineers
    Description

    Context:

    You are an Analytics Engineer at an EdTech company focused on improving customer learning experiences. Your team relies on in-depth analysis of user data to enhance the learning journey and inform product feature updates.

    Objective:

    • Your mission is to transform raw data into structured views that enable data analysts to effectively monitor and analyze user activities, performance patterns, and feedback. These insights are critical for data-informed decision-making within the Customer Success team.

    Dataset Overview:

    • Your company organizes content in a hierarchical structure, categorized as TrackCourseTopicLesson. Each lesson can take various formats, such as videos, practice exercises, exams, etc.
    • Any learning activity done by the user on a lesson is stored in logs in the user_lesson_progress_log table. A user can have multiple logs for a lesson in a day.
    • You have user registration data that store registration information and demographic information of users.
    • A user can give feedback on a lesson multiple times.

    DB Diagram: https://dbdiagram.io/d/627100b17f945876b6a93e54 (use the ‘Highlight’ option to understand the relationships)

    Table Descriptions

    track_table: Contains all tracks

    ColumnDescriptionSchema
    track_idunique id for an individual trackstring
    track_titlename of the trackstring

    course_table: Contains all courses

    ColumnDescriptionSchema
    course_idunique id for an individual coursestring
    track_idtrack id to which this course belongs tostring
    course_titlename of the coursestring

    topic_table: Contains all topics

    ColumnDescriptionSchema
    topic_idunique id for an individual topicstring
    course_idcourse id to which this topic belongs tostring
    topic_titlename of the topicstring

    lesson_table: Contains all lessons

    ColumnDescriptionSchema
    lesson_idunique id for individual lessonstring
    topic_idtopic id to which this lesson belongs tostring
    lesson_titlename of the lessonstring
    lesson_typetype of the lesson i.e., it may be practice, video, examstring
    duration_in_secideal duration of the lesson (in seconds) in which user can complete the lessonfloat

    user_registrations: Contains the registration information of the users. A user has only one entry

    ColumnDescriptionSchema
    user_idunique id for an individual userstring
    registration_datedate at which a user registeredstring
    user_infocontains information about the users. The field stores address, education_info, and profile in JSON formatstring

    user_lesson_progress_log: Any learning activity done by the user on a lesson is stored in logs. A user can have multiple logs for a lesson in a day. Every time a lesson completion percentage of a user is updated, a log is recorded here.

    ColumnDescriptionSchema
    idunique id for each entrystring
    user_idunique id for an individual userstring
    lesson_idunique id for a particular lessonstring
    overall_completion_percentagetotal completion percentage of the lesson at the time of logfloat
    completion_percentage_differenceDifference between the overall _completion _percentage of the lesson and the immediate preceding overall _completion _percentagefloat
    activity_recorded_datetime_in_ utcdatetime at which the user has done some activity on the lessondatetime

    Example: If a user u1 has started the lesson lesson1 and completed 10% of the lesson at May 1st 2022 8:00:00 UTC. And, the user completed 30% of the lesson at May 1st 2022 10:00:00 UTC and 20% of the lesson at May 3rd 2022 10:00:00 UTC, then the logs are recorded as follows:

    iduser_idlesson_idoverall_completion_percentagecompletion_percentage_differenceactivity_recorded_datetime_in_utc
    id1u1lesson110102022-05-01 08:00:00
    id2u1lesson140302022-05-01 10:00:00
    id3u1lesson160202022-05-03 10:00:00

    user_feedback: The table contains the feedback data given by the users. A user can give feedback to a lesson multiple times. Each feedback contains multiple questions. Each question and response is stored in an entry.

    ColumnDescriptionSchema
    idunique id for each entrystring
    feedback_idunique id for each feedbackstring
    creation_datetimedatetime at which user gave a feedbackstring
    user_iduser id who gave the feedbackfloat
    lesson_id...
  15. D

    Database DevOps Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Database DevOps Software Report [Dataset]. https://www.datainsightsmarket.com/reports/database-devops-software-528967
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Database DevOps Software market is experiencing robust growth, driven by the increasing adoption of DevOps practices across organizations of all sizes and the rising demand for efficient database management solutions. The market, estimated at $2 billion in 2025, is projected to expand significantly over the forecast period (2025-2033), fueled by a compound annual growth rate (CAGR) of 15%. This growth is propelled by several key factors. The shift towards cloud-based infrastructure, offering scalability and cost-effectiveness, is a major driver. Furthermore, the growing complexity of databases and the need for automation in database deployments and management are pushing organizations to adopt Database DevOps solutions. Large enterprises are leading the adoption, but SMEs are also increasingly recognizing the value proposition, further contributing to market expansion. The demand for seamless integration with existing CI/CD pipelines and improved collaboration among development and operations teams is another key factor driving market growth. However, the market also faces certain restraints. The initial investment costs associated with implementing Database DevOps tools and the need for skilled professionals proficient in these tools can pose challenges for some organizations. Furthermore, integrating these tools into legacy systems can be complex and time-consuming, creating a barrier to entry for some businesses. Despite these challenges, the long-term benefits of improved efficiency, reduced risk, and faster deployment cycles are expected to outweigh the initial hurdles, ensuring continued market expansion. The market is segmented by application (Large Enterprises, SMEs) and type (Cloud-based, On-premise), with the cloud-based segment expected to dominate due to its inherent advantages in scalability, flexibility, and cost-optimization. Geographic expansion, particularly in rapidly developing economies in Asia-Pacific and other regions, presents substantial growth opportunities for market players.

  16. US Hate Crimes Data 1991-2019 (csv)

    • kaggle.com
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). US Hate Crimes Data 1991-2019 (csv) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/us-hate-crimes-data-1991-2019-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2022
    Dataset provided by
    Kaggle
    Authors
    Jeffery Mandrake
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    FBI hate crime statistics.

    The differences between this dataset and the original CSV file are: 1. Some "less significant" columns were filtered out to make it easier to work with the dataset 2. The TOTAL_INDIVIDUAL_VICTIMS column was renamed to victim_count 3. The column names are all lower case instead of all upper case

    Everything else was left as is (apart from the deleted columns)

    Practice dataset for:

    Pandas, Pyspark, SQL

  17. o

    Regional YouTube Viral Content Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Regional YouTube Viral Content Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/34cfa60b-afac-4753-9409-bc00f9e8fbec
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube, Data Science and Analytics
    Description

    This dataset contains YouTube trending video statistics for various Mediterranean countries. Its primary purpose is to provide insights into popular video content, channels, and viewer engagement across the region over specific periods. It is valuable for analysing content trends, understanding regional audience preferences, and assessing video performance metrics on the YouTube platform.

    Columns

    • country: The nation where the video was published.
    • video_id: A unique identification number assigned to each video.
    • title: The name of the video.
    • publishedAt: The publication date of the video.
    • channelId: The unique identification number for the channel that published the video.
    • channelTitle: The name of the channel that published the video.
    • categoryId: The category identification number of the video (e.g., '10' for 'music').
    • trending_date: The date on which the video was observed to be trending.
    • tags: Keywords or phrases associated with the video.
    • view_count: The total number of views the video has accumulated.
    • comment_count: The total number of comments received on the video.
    • thumbnail_link: The URL for the image displayed before the video is played.
    • comments_disabled: A boolean indicator showing if comments are disabled for the video.
    • ratings_disabled: A boolean indicator showing if ratings (likes/dislikes) are disabled for the video.
    • description: The explanatory text provided below the video.

    Distribution

    The dataset is structured in a tabular format, typically provided as a CSV file. It consists of 15 distinct columns detailing various aspects of YouTube trending videos. While the exact total number of rows or records is not specified, the data includes trending video counts for several date ranges in 2022: * 06/04/2022 - 06/08/2022: 31 records * 06/08/2022 - 06/11/2022: 56 records * 06/11/2022 - 06/15/2022: 57 records * 06/15/2022 - 06/19/2022: 111 records * 06/19/2022 - 06/22/2022: 130 records * 06/22/2022 - 06/26/2022: 207 records * 06/26/2022 - 06/29/2022: 321 records * 06/29/2022 - 07/03/2022: 523 records * 07/03/2022 - 07/07/2022: 924 records * 07/07/2022 - 07/10/2022: 861 records The dataset features 19 unique countries and 1347 unique video IDs. View counts for videos in the dataset range from approximately 20.9 thousand to 123 million.

    Usage

    This dataset is well-suited for a variety of analytical applications and use cases: * Exploratory Data Analysis (EDA): Discovering patterns, anomalies, and relationships within YouTube trending content. * Data Manipulation and Querying: Practising data handling using libraries such as Pandas or Numpy in Python, or executing queries with SQL. * Natural Language Processing (NLP): Analysing video titles, tags, and descriptions to extract key themes, sentiment, and trending topics. * Trend Prediction: Developing models to forecast future trending videos or content categories. * Cross-Country Comparison: Examining how trending content varies across different Mediterranean nations.

    Coverage

    • Geographic Scope: The dataset covers YouTube trending video statistics for 19 specific Mediterranean countries. These include Italy (IT), Spain (ES), Greece (GR), Croatia (HR), Turkey (TR), Albania (AL), Algeria (DZ), Egypt (EG), Libya (LY), Tunisia (TN), Morocco (MA), Israel (IL), Montenegro (ME), Lebanon (LB), France (FR), Bosnia and Herzegovina (BA), Malta (MT), Slovenia (SI), Cyprus (CY), and Syria (SY).
    • Time Range: The data primarily spans from 2022-06-04 to 2022-07-10, providing detailed daily trending information. A specific snapshot of the dataset is also available for 2022-11-07.

    License

    CC0

    Who Can Use It

    • Data Scientists and Analysts: For conducting in-depth research, building predictive models, and generating insights on social media trends.
    • Researchers: Those studying online content consumption patterns, regional cultural influences, and digital media behaviour.
    • Marketing Professionals: To identify popular content types, inform content strategy, and understand audience engagement on YouTube.
    • Students: For academic projects focusing on web data analysis, natural language processing, and statistical modelling.

    Dataset Name Suggestions

    • Mediterranean YouTube Trends 2022
    • YouTube Trending Videos: Mediterranean Insights
    • Regional YouTube Viral Content
    • Mediterranean Social Media Video Data
    • YouTube Trends in Southern Europe & North Africa

    Attributes

    Original Data Source: YouTube Trending Videos of the Day

  18. Australian Employee Salary/Wages DATAbase by detailed occupation, location...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Australian Taxation Office (2023). Australian Employee Salary/Wages DATAbase by detailed occupation, location and year (2002-14); (plus Sole Traders) [Dataset]. http://doi.org/10.6084/m9.figshare.4522895.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Australian Taxation Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ATO (Australian Tax Office) made a dataset openly available (see links) showing all the Australian Salary and Wages (2002, 2006, 2010, 2014) by detailed occupation (around 1,000) and over 100 SA4 regions. Sole Trader sales and earnings are also provided. This open data (csv) is now packaged into a database (*.sql) with 45 sample SQL queries (backupSQL[date]_public.txt).See more description at related Figshare #datavis record. Versions:V5: Following #datascience course, I have made main data (individual salary and wages) available as csv and Jupyter Notebook. Checksum matches #dataTotals. In 209,xxx rows.Also provided Jobs, and SA4(Locations) description files as csv. More details at: Where are jobs growing/shrinking? Figshare DOI: 4056282 (linked below). Noted 1% discrepancy ($6B) in 2010 wages total - to follow up.#dataTotals - Salary and WagesYearWorkers (M)Earnings ($B) 20028.528520069.4372201010.2481201410.3584#dataTotal - Sole TradersYearWorkers (M)Sales ($B)Earnings ($B)20020.9611320061.0881920101.11122620141.19630#links See ATO request for data at ideascale link below.See original csv open data set (CC-BY) at data.gov.au link below.This database was used to create maps of change in regional employment - see Figshare link below (m9.figshare.4056282).#packageThis file package contains a database (analysing the open data) in SQL package and sample SQL text, interrogating the DB. DB name: test. There are 20 queries relating to Salary and Wages.#analysisThe database was analysed and outputs provided on Nectar(.org.au) resources at: http://118.138.240.130.(offline)This is only resourced for max 1 year, from July 2016, so will expire in June 2017. Hence the filing here. The sample home page is provided here (and pdf), but not all the supporting files, which may be packaged and added later. Until then all files are available at the Nectar URL. Nectar URL now offline - server files attached as package (html_backup[date].zip), including php scripts, html, csv, jpegs.#installIMPORT: DB SQL dump e.g. test_2016-12-20.sql (14.8Mb)1.Started MAMP on OSX.1.1 Go to PhpMyAdmin2. New Database: 3. Import: Choose file: test_2016-12-20.sql -> Go (about 15-20 seconds on MacBookPro 16Gb, 2.3 Ghz i5)4. four tables appeared: jobTitles 3,208 rows | salaryWages 209,697 rows | soleTrader 97,209 rows | stateNames 9 rowsplus views e.g. deltahair, Industrycodes, states5. Run test query under **#; Sum of Salary by SA4 e.g. 101 $4.7B, 102 $6.9B#sampleSQLselect sa4,(select sum(count) from salaryWageswhere year = '2014' and sa4 = sw.sa4) as thisYr14,(select sum(count) from salaryWageswhere year = '2010' and sa4 = sw.sa4) as thisYr10,(select sum(count) from salaryWageswhere year = '2006' and sa4 = sw.sa4) as thisYr06,(select sum(count) from salaryWageswhere year = '2002' and sa4 = sw.sa4) as thisYr02from salaryWages swgroup by sa4order by sa4

  19. I

    Global Structured Query Language Server Transformation Market Industry Best...

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Structured Query Language Server Transformation Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/structured-query-language-server-transformation-market-78857
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Structured Query Language (SQL) Server Transformation market is an integral segment of the data management industry, playing a crucial role in the integration, transformation, and processing of data. Widely employed by businesses across various sectors, SQL Server Transformation involves the manipulation of larg

  20. 30 Short Tips for Your Data Scientist Interview

    • kaggle.com
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skillslash17 (2023). 30 Short Tips for Your Data Scientist Interview [Dataset]. https://www.kaggle.com/datasets/skillslash17/30-short-tips-for-your-data-scientist-interview
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Skillslash17
    Description

    If you’re a data scientist looking to get ahead in the ever-changing world of data science, you know that job interviews are a crucial part of your career. But getting a job as a data scientist is not just about being tech-savvy, it’s also about having the right skillset, being able to solve problems, and having good communication skills. With competition heating up, it’s important to stand out and make a good impression on potential employers.

    Data Science has become an essential part of the contemporary business environment, enabling decision-making in a variety of industries. Consequently, organizations are increasingly looking for individuals who can utilize the power of data to generate new ideas and expand their operations. However these roles come with a high level of expectation, requiring applicants to possess a comprehensive knowledge of data analytics and machine learning, as well as the capacity to turn their discoveries into practical solutions.

    With so many job seekers out there, it’s super important to be prepared and confident for your interview as a data scientist.

    Here are 30 tips to help you get the most out of your interview and land the job you want. No matter if you’re just starting out or have been in the field for a while, these tips will help you make the most of your interview and set you up for success.

    Technical Preparation

    Qualifying for a job as a data scientist needs a comprehensive level of technical preparation. Job seekers are often required to demonstrate their technical skills in order to show their ability to effectively fulfill the duties of the role. Here are a selection of key tips for technical proficiency:

    1 Master the Basics

    Make sure you have a good understanding of statistics, math, and programming languages such as Python and R.

    2 Understand Machine Learning

    Gain an in-depth understanding of commonly used machine learning techniques, including linear regression and decision trees, as well as neural networks.

    3 Data Manipulation

    Make sure you're good with data tools like Pandas and Matplotlib, as well as data visualization tools like Seaborn.

    4 SQL Skills

    Gain proficiency in the use of SQL language to extract and process data from databases.

    5 Feature Engineering

    Understand and know the importance of feature engineering and how to create meaningful features from raw data.

    6 Model Evaluation

    Learn to assess and compare machine learning models using metrics like accuracy, precision, recall, and F1-score.

    7 Big Data Technologies

    If the job requires it, become familiar with big data technologies like Hadoop and Spark.

    8 Coding Challenges

    Practice coding challenges related to data manipulation and machine learning on platforms like LeetCode and Kaggle.

    Portfolio and Projects

    9 Build a Portfolio

    Develop a portfolio of your data science projects that outlines your methodology, the resources you have employed, and the results achieved.

    10 Kaggle Competitions

    Participate in Kaggle competitions to gain real-world experience and showcase your problem-solving skills.

    11 Open Source Contributions

    Contribute to open-source data science projects to demonstrate your collaboration and coding abilities.

    12 GitHub Profile

    Maintain a well-organized GitHub profile with clean code and clear project documentation.

    Domain Knowledge

    13 Understand the Industry

    Research the industry you’re applying to and understand its specific data challenges and opportunities.

    14 Company Research

    Study the company you’re interviewing with to tailor your responses and show your genuine interest.

    Soft Skills

    15 Communication

    Practice explaining complex concepts in simple terms. Data Scientists often need to communicate findings to non-technical stakeholders.

    16 Problem-Solving

    Focus on your problem-solving abilities and how you approach complex challenges.

    17 Adaptability

    Highlight your ability to adapt to new technologies and techniques as the field of data science evolves.

    Interview Etiquette

    18 Professional Appearance

    Dress and present yourself in a professional manner, whether the interview is in person or remote.

    19 Punctuality

    Be on time for the interview, whether it’s virtual or in person.

    20 Body Language

    Maintain good posture and eye contact during the interview. Smile and exhibit confidence.

    21 Active Listening

    Pay close attention to the interviewer's questions and answer them directly.

    Behavioral Questions

    22 STAR Method

    Use the STAR (Situation, Task, Action, Result) method to structure your responses to behavioral questions.

    23 Conflict Resolution

    Be prepared to discuss how you have handled conflicts or challenging situations in previous roles.

    24 Teamwork

    Highlight instances where you’ve worked effectively in cross-functional teams...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985
Organization logo

SQL Databases for Students and Educators

Explore at:
bin, htmlAvailable download formats
Dataset updated
Oct 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.

Search
Clear search
Close search
Google apps
Main menu