25 datasets found

SQL Databases for Students and Educators
zenodo.org
bin, html
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985
Explore at:
bin, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4136985
Dataset updated
Oct 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.
P
WikiSQL Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Zhong; Caiming Xiong; Richard Socher, WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
Explore at:
Authors
Victor Zhong; Caiming Xiong; Richard Socher
Description
WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.
D
Database Management Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Database Management Software Report [Dataset]. https://www.marketresearchforecast.com/reports/database-management-software-40762
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Database Management Software (DBMS) market is experiencing robust growth, projected to reach $1453.9 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 10.2% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of cloud-based solutions offers scalability, cost-effectiveness, and enhanced accessibility, driving significant market share for cloud-based DBMS offerings. Furthermore, the burgeoning volume of data generated across various sectors, particularly in large enterprises and SMEs, necessitates robust and efficient database management systems. The demand for advanced analytics and real-time data processing is further propelling market growth. While the market faces challenges such as data security concerns and the need for skilled professionals to manage complex DBMS systems, the overall outlook remains positive. The market segmentation reveals a strong preference for cloud-based solutions across both large enterprises and SMEs. North America currently holds a significant market share due to early adoption and technological advancements, but the Asia-Pacific region is poised for rapid growth given its expanding digital economy and increasing investment in data infrastructure. Competition among established players like IBM, Oracle, and Microsoft, alongside emerging players offering specialized solutions, ensures a dynamic and innovative market landscape. The forecast period (2025-2033) anticipates continued growth driven by several factors. Technological advancements, such as the development of NoSQL databases and in-memory databases, will cater to the evolving data management needs of businesses. The increasing integration of artificial intelligence (AI) and machine learning (ML) into DBMS solutions will enhance functionalities such as data analysis and predictive modelling, further boosting market demand. Geographic expansion into developing economies, fueled by digital transformation initiatives, will also contribute to market expansion. However, maintaining robust data security practices and addressing the skills gap in DBMS management will remain crucial for sustained growth. The competitive landscape will continue to evolve with mergers, acquisitions, and technological innovations driving the market's trajectory over the coming years.
f
Additional file 1: of Examining database persistence of ISO/EN 13606...
springernature.figshare.com
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo SĂĄnchez-de-Madariaga; Adolfo MuĂąoz; Raimundo Lozano-RubĂ; Pablo Serrano-Balazote; Antonio Castro; Oscar Moreno; Mario Pascual (2023). Additional file 1: of Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches [Dataset]. http://doi.org/10.6084/m9.figshare.c.3858004_D1.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3858004_D1.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Ricardo SĂĄnchez-de-Madariaga; Adolfo MuĂąoz; Raimundo Lozano-RubĂ; Pablo Serrano-Balazote; Antonio Castro; Oscar Moreno; Mario Pascual
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SQL program. Program written in SQL performing the six queries on the MySQL database. (SQL 15.3 kb)
Definitions of incidence and prevalence terms.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis (2023). Definitions of incidence and prevalence terms. [Dataset]. http://doi.org/10.1371/journal.pone.0171784.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0171784.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Definitions of incidence and prevalence terms.
Purchase Order Data
data.ca.gov
catalog.data.gov
csv, docx, pdf
Updated Oct 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of General Services (2019). Purchase Order Data [Dataset]. https://data.ca.gov/dataset/purchase-order-data
Explore at:
docx, csv, pdfAvailable download formats
Dataset updated
Oct 23, 2019
Dataset authored and provided by
California Department of General Services
Description
The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015

Data Limitations:
Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.

Data Collection Methodology:

The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.

Secondary/Related Resources:

State Contract Manual (SCM) vol. 2 http://www.dgs.ca.gov/pd/Resources/publications/SCM2.aspx

State Contract Manual (SCM) vol. 3 http://www.dgs.ca.gov/pd/Resources/publications/SCM3.aspx

Buying Green http://www.dgs.ca.gov/buyinggreen/Home.aspx

United Nations Standard Products and Services Code, http://www.unspsc.org/
Hospital Management Dataset
kaggle.com
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanak Baghel (2025). Hospital Management Dataset [Dataset]. https://www.kaggle.com/datasets/kanakbaghel/hospital-management-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kanak Baghel
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is a structured, multi-table dataset designed to simulate a hospital management system. It is ideal for practicing data analysis, SQL, machine learning, and healthcare analytics.

Dataset Overview

This dataset includes five CSV files:

patients.csv – Patient demographics, contact details, registration info, and insurance data

doctors.csv – Doctor profiles with specializations, experience, and contact information

appointments.csv – Appointment dates, times, visit reasons, and statuses

treatments.csv – Treatment types, descriptions, dates, and associated costs

billing.csv – Billing amounts, payment methods, and status linked to treatments

📁 Files & Column Descriptions

** patients.csv**

Contains patient demographic and registration details.

Column Description

patient_id -> Unique ID for each patient first_name -> Patient's first name last_name -> Patient's last name gender -> Gender (M/F) date_of_birth -> Date of birth contact_number -> Phone number address -> Address of the patient registration_date -> Date of first registration at the hospital insurance_provider -> Insurance company name insurance_number -> Policy number email -> Email address

** doctors.csv**

Details about the doctors working in the hospital.

Column Description

doctor_id -> Unique ID for each doctor first_name -> Doctor's first name last_name -> Doctor's last name specialization -> Medical field of expertise phone_number -> Contact number years_experience -> Total years of experience hospital_branch -> Branch of hospital where doctor is based email -> Official email address

appointments.csv

Records of scheduled and completed patient appointments.

Column Description

appointment_id -> Unique appointment ID patient_id -> ID of the patient doctor_id -> ID of the attending doctor appointment_date -> Date of the appointment appointment_time -> Time of the appointment reason_for_visit -> Purpose of visit (e.g., checkup) status -> Status (Scheduled, Completed, Cancelled)

treatments.csv

Information about the treatments given during appointments.

Column Description

treatment_id -> Unique ID for each treatment appointment_id -> Associated appointment ID treatment_type -> Type of treatment (e.g., MRI, X-ray) description -> Notes or procedure details cost -> Cost of treatment treatment_date -> Date when treatment was given

** billing.csv**

Billing and payment details for treatments.

Column Description

bill_id -> Unique billing ID patient_id -> ID of the billed patient treatment_id -> ID of the related treatment bill_date -> Date of billing amount -> Total amount billed payment_method -> Mode of payment (Cash, Card, Insurance) payment_status -> Status of payment (Paid, Pending, Failed)

Possible Use Cases

SQL queries and relational database design

Exploratory data analysis (EDA) and dashboarding

Machine learning projects (e.g., cost prediction, no-show analysis)

Feature engineering and data cleaning practice

End-to-end healthcare analytics workflows

Recommended Tools & Resources

SQL (joins, filters, window functions)

Pandas and Matplotlib/Seaborn for EDA

Scikit-learn for ML models

Pandas Profiling for automated EDA

Plotly for interactive visualizations

Please Note that :

All data is synthetically generated for educational and project use. No real patient information is included.

If you find this dataset helpful, consider upvoting or sharing your insights by creating a Kaggle notebook.
Available functions in rEHR.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis (2023). Available functions in rEHR. [Dataset]. http://doi.org/10.1371/journal.pone.0171784.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0171784.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
David A. Springate; Rosa Parisi; Ivan Olier; David Reeves; Evangelos Kontopantelis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Available functions in rEHR.
Opening the Valve on Pure Data Dataset
zenodo.org
application/gzip, zip
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle (2024). Opening the Valve on Pure Data Dataset [Dataset]. http://doi.org/10.5281/zenodo.10576757
Explore at:
zip, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10576757
Dataset updated
Feb 7, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page contains the i) SQLite database, and ii) scripts and instructions for the paper titled Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language.

We have provided two main files in this link:

dataset.tar.gz

scripts_and_instructions.zip

Additionally, the i) SQLite database, ii) scripts and instructions, and iii) mirrored repositories of the PD projects can also be found in the following link: https://archive.org/details/Opening_the_Valve_on_Pure_Data.

The download instructions are as follows:

Our dataset is available at this link and also at archive.org and at

https://zenodo.org/records/10576757

as a file titled dataset.tar.gz (~1.12GB). You can download the file and then you can unzip the database by running tar -xzf dataset.tar.gz.

You can also find the scripts and instructions needed to use our database and replicate our work inside the scripts_and_instructions.zip (~116MB) file, which you can download from this link and also from the same archive.org link. After that, you can unzip the scripts_and_instructions.zip file by using the command: unzip scripts_and_instructions.zip.

Finally, the mirrored PD repositories are available at archive.org. The file is titled pd_mirrored.tar.gz (~242.5GB). You can download the zipped folder of the mirrored repositories using the following command: wget -c https://archive.org/download/Opening_the_Valve_on_Pure_Data/pd_mirrored.tar.gz. After that, you can unzip the file using tar -xzf pd_mirrored.tar.gz.

You can find a README.md file inside the unzipped directory titled scripts_and_instructions detailing the structure and usage of our dataset, along with some sample SQL queries and additional helper scripts for the database. Furthermore, we have provided instructions for replicating our work in the same README file.
D
Database Monitoring Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Database Monitoring Software Report [Dataset]. https://www.datainsightsmarket.com/reports/database-monitoring-software-1973363
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 26, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Database Monitoring Software market is experiencing robust growth, driven by the increasing adoption of cloud-based databases, the rise of big data analytics, and the growing need for enhanced application performance and availability. The market, estimated at $5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated $15 billion by 2033. This expansion is fueled by several key factors: the complexity of modern database environments requiring sophisticated monitoring tools, the stringent regulatory compliance mandates pushing for improved data security and reliability, and the burgeoning adoption of DevOps practices that necessitate real-time database insights. Key trends shaping this market include the integration of AI and machine learning for predictive analytics and automated alerts, the growing demand for multi-cloud database monitoring solutions, and the increasing focus on observability to proactively identify and resolve performance bottlenecks. Despite this positive outlook, challenges remain, such as the rising cost of implementation and integration, the need for skilled professionals to manage these complex systems, and the potential for vendor lock-in with proprietary solutions. The competitive landscape is marked by a diverse range of vendors, including established players like Datadog, SolarWinds, and Micro Focus, alongside niche providers catering to specific database technologies or industry verticals. The market is witnessing increased consolidation as larger players acquire smaller firms to expand their product portfolios and market reach. To maintain a competitive edge, vendors are focusing on innovation, offering comprehensive features such as performance monitoring, security auditing, and capacity planning, along with enhanced user interfaces and seamless integration with existing IT infrastructure. The geographic distribution is expected to be fairly broad, with North America and Europe holding significant market share initially, followed by a steady rise in adoption across Asia-Pacific and other regions driven by digital transformation initiatives in developing economies.
US Business Dataset - Multiple Categories
kaggle.com
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). US Business Dataset - Multiple Categories [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/us-business-dataset-multiple-categories/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jeffery Mandrake
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Business dataset. Phone numbers, addresses and emails have been removed. This data came from an old database (over 10 years). Use as a practice dataset for Pandas, Pyspark or SQL. This dataset contains 784,156 records.
D
Data Modeling Tool Report
marketresearchforecast.com
doc, pdf, ppt
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Modeling Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/data-modeling-tool-542143
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 30, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data modeling tool market is experiencing robust growth, driven by the increasing demand for efficient data management and the rise of big data analytics. The market, estimated at $5 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. This expansion is fueled by several key factors, including the growing adoption of cloud-based data modeling solutions, the increasing need for data governance and compliance, and the expanding use of data visualization and business intelligence tools that rely on well-structured data models. The market is segmented by tool type (e.g., ER diagramming tools, UML modeling tools), deployment mode (cloud, on-premise), and industry vertical (e.g., BFSI, healthcare, retail). Competition is intense, with established players like IBM, Oracle, and SAP vying for market share alongside numerous specialized vendors offering niche solutions. The market's growth is being further accelerated by the adoption of agile methodologies and DevOps practices that necessitate faster and more iterative data modeling processes. The major restraints impacting market growth include the high cost of advanced data modeling software, the complexity associated with implementing and maintaining these solutions, and the lack of skilled professionals adept at data modeling techniques. The increasing availability of open-source tools, coupled with the growth of professional training programs focused on data modeling, are gradually alleviating this constraint. Future growth will likely be shaped by innovations in artificial intelligence (AI) and machine learning (ML) that are being integrated into data modeling tools to automate aspects of model creation and validation. The trend towards data mesh architecture and the growing importance of data literacy are also driving demand for user-friendly and accessible data modeling tools. Furthermore, the development of integrated platforms that combine data modeling with other data management functions is a key market trend that is likely to significantly impact future growth.
Analysis code & Data for the combined Cogcarsim studies 2017+2019
figshare.com
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analysis code & Data for the combined Cogcarsim studies 2017+2019 [Dataset]. https://figshare.com/articles/dataset/Data_for_the_combined_Cogcarsim_studies_2017_2019/13567409
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13567409.v3
Dataset updated
Apr 9, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Benjamin Cowley
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CODE--------R markdown script 'cogcarsim_analyses.Rmd' will recompute the analyses from Palomäki et al 2021, “The Link Between Flow and Performance is Moderated by Task Experience”. Precompiled HTML output of this script is also provided.To run the script, download all contents of this Figshare object, load cogcarsim_analyses.Rmd in RStudio and knit (press Ctrl+Shift+k on Linux).Note also that to export figures, uncomment the corresponding lines of code (e.g. line 116: #ggsave(“figure4.pdf”, width=12, height=6)DATA-------SQL databases cogcarsim2_2017.db & cogcarsim2_2019.db contain the CogCarSim log data of 18 subjects, 9 from 2017 and 9 from 2019.background_2017.csv & background_2019.csv contain original profile data on 18 subjects. background_cogcarsim_2017.csv & background_cogcarsim_2019.csv contain cleaned-up, mutually compatible profile data on 18 subjects.fss_data_2017.csv & fss_data_2019.csv contain Flow Short Scale self-report data on 18 subjects. fss_learning.csv combines them and adds variables on learning derived from models fitted to data from the SQL database files. This file is generated by the accompanying R code cogcarsim_analyses.R

Analytic Engineer

kaggle.com

Updated Jun 7, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

NxtWave Data Engineers (2022). Analytic Engineer [Dataset]. https://www.kaggle.com/datasets/nxtwavedataengineers/data-engineer

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 7, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

NxtWave Data Engineers

Description

Context:

You are an Analytics Engineer at an EdTech company focused on improving customer learning experiences. Your team relies on in-depth analysis of user data to enhance the learning journey and inform product feature updates.

Objective:

Your mission is to transform raw data into structured views that enable data analysts to effectively monitor and analyze user activities, performance patterns, and feedback. These insights are critical for data-informed decision-making within the Customer Success team.

Dataset Overview:

Your company organizes content in a hierarchical structure, categorized as Track → Course → Topic → Lesson. Each lesson can take various formats, such as videos, practice exercises, exams, etc.
Any learning activity done by the user on a lesson is stored in logs in the user_lesson_progress_log table. A user can have multiple logs for a lesson in a day.
You have user registration data that store registration information and demographic information of users.
A user can give feedback on a lesson multiple times.

DB Diagram: https://dbdiagram.io/d/627100b17f945876b6a93e54 (use the ‘Highlight’ option to understand the relationships)

Table Descriptions

track_table: Contains all tracks

Column	Description	Schema
track_id	unique id for an individual track	string
track_title	name of the track	string

course_table: Contains all courses

Column	Description	Schema
course_id	unique id for an individual course	string
track_id	track id to which this course belongs to	string
course_title	name of the course	string

topic_table: Contains all topics

Column	Description	Schema
topic_id	unique id for an individual topic	string
course_id	course id to which this topic belongs to	string
topic_title	name of the topic	string

lesson_table: Contains all lessons

Column	Description	Schema
lesson_id	unique id for individual lesson	string
topic_id	topic id to which this lesson belongs to	string
lesson_title	name of the lesson	string
lesson_type	type of the lesson i.e., it may be practice, video, exam	string
`duration_in_sec`	ideal duration of the lesson (in seconds) in which user can complete the lesson	float

user_registrations: Contains the registration information of the users. A user has only one entry

Column	Description	Schema
user_id	unique id for an individual user	string
registration_date	date at which a user registered	string
user_info	contains information about the users. The field stores address, education_info, and profile in JSON format	string

user_lesson_progress_log: Any learning activity done by the user on a lesson is stored in logs. A user can have multiple logs for a lesson in a day. Every time a lesson completion percentage of a user is updated, a log is recorded here.

Column	Description	Schema
id	unique id for each entry	string
user_id	unique id for an individual user	string
lesson_id	unique id for a particular lesson	string
`overall_completion_percentage`	total completion percentage of the lesson at the time of log	float
`completion_percentage_difference`	Difference between the overall _completion _percentage of the lesson and the immediate preceding overall _completion _percentage	float
`activity_recorded_datetime_in_ utc`	datetime at which the user has done some activity on the lesson	datetime

Example: If a user u1 has started the lesson lesson1 and completed 10% of the lesson at May 1st 2022 8:00:00 UTC. And, the user completed 30% of the lesson at May 1st 2022 10:00:00 UTC and 20% of the lesson at May 3rd 2022 10:00:00 UTC, then the logs are recorded as follows:

id	user_id	lesson_id	`overall_completion_percentage`	`completion_percentage_difference`	`activity_recorded_datetime_in_utc`
id1	u1	lesson1	10	10	2022-05-01 08:00:00
id2	u1	lesson1	40	30	2022-05-01 10:00:00
id3	u1	lesson1	60	20	2022-05-03 10:00:00

user_feedback: The table contains the feedback data given by the users. A user can give feedback to a lesson multiple times. Each feedback contains multiple questions. Each question and response is stored in an entry.

Column	Description	Schema
id	unique id for each entry	string
feedback_id	unique id for each feedback	string
creation_datetime	datetime at which user gave a feedback	string
user_id	user id who gave the feedback	float
lesson_id	...

D
Database DevOps Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Database DevOps Software Report [Dataset]. https://www.datainsightsmarket.com/reports/database-devops-software-528967
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 21, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Database DevOps Software market is experiencing robust growth, driven by the increasing adoption of DevOps practices across organizations of all sizes and the rising demand for efficient database management solutions. The market, estimated at $2 billion in 2025, is projected to expand significantly over the forecast period (2025-2033), fueled by a compound annual growth rate (CAGR) of 15%. This growth is propelled by several key factors. The shift towards cloud-based infrastructure, offering scalability and cost-effectiveness, is a major driver. Furthermore, the growing complexity of databases and the need for automation in database deployments and management are pushing organizations to adopt Database DevOps solutions. Large enterprises are leading the adoption, but SMEs are also increasingly recognizing the value proposition, further contributing to market expansion. The demand for seamless integration with existing CI/CD pipelines and improved collaboration among development and operations teams is another key factor driving market growth. However, the market also faces certain restraints. The initial investment costs associated with implementing Database DevOps tools and the need for skilled professionals proficient in these tools can pose challenges for some organizations. Furthermore, integrating these tools into legacy systems can be complex and time-consuming, creating a barrier to entry for some businesses. Despite these challenges, the long-term benefits of improved efficiency, reduced risk, and faster deployment cycles are expected to outweigh the initial hurdles, ensuring continued market expansion. The market is segmented by application (Large Enterprises, SMEs) and type (Cloud-based, On-premise), with the cloud-based segment expected to dominate due to its inherent advantages in scalability, flexibility, and cost-optimization. Geographic expansion, particularly in rapidly developing economies in Asia-Pacific and other regions, presents substantial growth opportunities for market players.
US Hate Crimes Data 1991-2019 (csv)
kaggle.com
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). US Hate Crimes Data 1991-2019 (csv) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/us-hate-crimes-data-1991-2019-csv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2022
Dataset provided by
Kaggle
Authors
Jeffery Mandrake
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
FBI hate crime statistics.

The differences between this dataset and the original CSV file are: 1. Some "less significant" columns were filtered out to make it easier to work with the dataset 2. The TOTAL_INDIVIDUAL_VICTIMS column was renamed to victim_count 3. The column names are all lower case instead of all upper case

Everything else was left as is (apart from the deleted columns)

Practice dataset for:

Pandas, Pyspark, SQL
o
Regional YouTube Viral Content Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Regional YouTube Viral Content Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/34cfa60b-afac-4753-9409-bc00f9e8fbec
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube, Data Science and Analytics
Description
This dataset contains YouTube trending video statistics for various Mediterranean countries. Its primary purpose is to provide insights into popular video content, channels, and viewer engagement across the region over specific periods. It is valuable for analysing content trends, understanding regional audience preferences, and assessing video performance metrics on the YouTube platform.

Columns

country: The nation where the video was published.

video_id: A unique identification number assigned to each video.

title: The name of the video.

publishedAt: The publication date of the video.

channelId: The unique identification number for the channel that published the video.

channelTitle: The name of the channel that published the video.

categoryId: The category identification number of the video (e.g., '10' for 'music').

trending_date: The date on which the video was observed to be trending.

tags: Keywords or phrases associated with the video.

view_count: The total number of views the video has accumulated.

comment_count: The total number of comments received on the video.

thumbnail_link: The URL for the image displayed before the video is played.

comments_disabled: A boolean indicator showing if comments are disabled for the video.

ratings_disabled: A boolean indicator showing if ratings (likes/dislikes) are disabled for the video.

description: The explanatory text provided below the video.

Distribution

The dataset is structured in a tabular format, typically provided as a CSV file. It consists of 15 distinct columns detailing various aspects of YouTube trending videos. While the exact total number of rows or records is not specified, the data includes trending video counts for several date ranges in 2022: * 06/04/2022 - 06/08/2022: 31 records * 06/08/2022 - 06/11/2022: 56 records * 06/11/2022 - 06/15/2022: 57 records * 06/15/2022 - 06/19/2022: 111 records * 06/19/2022 - 06/22/2022: 130 records * 06/22/2022 - 06/26/2022: 207 records * 06/26/2022 - 06/29/2022: 321 records * 06/29/2022 - 07/03/2022: 523 records * 07/03/2022 - 07/07/2022: 924 records * 07/07/2022 - 07/10/2022: 861 records The dataset features 19 unique countries and 1347 unique video IDs. View counts for videos in the dataset range from approximately 20.9 thousand to 123 million.

Usage

This dataset is well-suited for a variety of analytical applications and use cases: * Exploratory Data Analysis (EDA): Discovering patterns, anomalies, and relationships within YouTube trending content. * Data Manipulation and Querying: Practising data handling using libraries such as Pandas or Numpy in Python, or executing queries with SQL. * Natural Language Processing (NLP): Analysing video titles, tags, and descriptions to extract key themes, sentiment, and trending topics. * Trend Prediction: Developing models to forecast future trending videos or content categories. * Cross-Country Comparison: Examining how trending content varies across different Mediterranean nations.

Coverage

Geographic Scope: The dataset covers YouTube trending video statistics for 19 specific Mediterranean countries. These include Italy (IT), Spain (ES), Greece (GR), Croatia (HR), Turkey (TR), Albania (AL), Algeria (DZ), Egypt (EG), Libya (LY), Tunisia (TN), Morocco (MA), Israel (IL), Montenegro (ME), Lebanon (LB), France (FR), Bosnia and Herzegovina (BA), Malta (MT), Slovenia (SI), Cyprus (CY), and Syria (SY).

Time Range: The data primarily spans from 2022-06-04 to 2022-07-10, providing detailed daily trending information. A specific snapshot of the dataset is also available for 2022-11-07.

License

CC0

Who Can Use It

Data Scientists and Analysts: For conducting in-depth research, building predictive models, and generating insights on social media trends.

Researchers: Those studying online content consumption patterns, regional cultural influences, and digital media behaviour.

Marketing Professionals: To identify popular content types, inform content strategy, and understand audience engagement on YouTube.

Students: For academic projects focusing on web data analysis, natural language processing, and statistical modelling.

Dataset Name Suggestions

Mediterranean YouTube Trends 2022

YouTube Trending Videos: Mediterranean Insights

Regional YouTube Viral Content

Mediterranean Social Media Video Data

YouTube Trends in Southern Europe & North Africa

Attributes

Original Data Source: YouTube Trending Videos of the Day
Australian Employee Salary/Wages DATAbase by detailed occupation, location...
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Australian Taxation Office (2023). Australian Employee Salary/Wages DATAbase by detailed occupation, location and year (2002-14); (plus Sole Traders) [Dataset]. http://doi.org/10.6084/m9.figshare.4522895.v5
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4522895.v5
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Australian Taxation Office
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ATO (Australian Tax Office) made a dataset openly available (see links) showing all the Australian Salary and Wages (2002, 2006, 2010, 2014) by detailed occupation (around 1,000) and over 100 SA4 regions. Sole Trader sales and earnings are also provided. This open data (csv) is now packaged into a database (*.sql) with 45 sample SQL queries (backupSQL[date]_public.txt).See more description at related Figshare #datavis record. Versions:V5: Following #datascience course, I have made main data (individual salary and wages) available as csv and Jupyter Notebook. Checksum matches #dataTotals. In 209,xxx rows.Also provided Jobs, and SA4(Locations) description files as csv. More details at: Where are jobs growing/shrinking? Figshare DOI: 4056282 (linked below). Noted 1% discrepancy ($6B) in 2010 wages total - to follow up.#dataTotals - Salary and WagesYearWorkers (M)Earnings ($B) 20028.528520069.4372201010.2481201410.3584#dataTotal - Sole TradersYearWorkers (M)Sales ($B)Earnings ($B)20020.9611320061.0881920101.11122620141.19630#links See ATO request for data at ideascale link below.See original csv open data set (CC-BY) at data.gov.au link below.This database was used to create maps of change in regional employment - see Figshare link below (m9.figshare.4056282).#packageThis file package contains a database (analysing the open data) in SQL package and sample SQL text, interrogating the DB. DB name: test. There are 20 queries relating to Salary and Wages.#analysisThe database was analysed and outputs provided on Nectar(.org.au) resources at: http://118.138.240.130.(offline)This is only resourced for max 1 year, from July 2016, so will expire in June 2017. Hence the filing here. The sample home page is provided here (and pdf), but not all the supporting files, which may be packaged and added later. Until then all files are available at the Nectar URL. Nectar URL now offline - server files attached as package (html_backup[date].zip), including php scripts, html, csv, jpegs.#installIMPORT: DB SQL dump e.g. test_2016-12-20.sql (14.8Mb)1.Started MAMP on OSX.1.1 Go to PhpMyAdmin2. New Database: 3. Import: Choose file: test_2016-12-20.sql -> Go (about 15-20 seconds on MacBookPro 16Gb, 2.3 Ghz i5)4. four tables appeared: jobTitles 3,208 rows | salaryWages 209,697 rows | soleTrader 97,209 rows | stateNames 9 rowsplus views e.g. deltahair, Industrycodes, states5. Run test query under **#; Sum of Salary by SA4 e.g. 101 $4.7B, 102 $6.9B#sampleSQLselect sa4,(select sum(count) from salaryWageswhere year = '2014' and sa4 = sw.sa4) as thisYr14,(select sum(count) from salaryWageswhere year = '2010' and sa4 = sw.sa4) as thisYr10,(select sum(count) from salaryWageswhere year = '2006' and sa4 = sw.sa4) as thisYr06,(select sum(count) from salaryWageswhere year = '2002' and sa4 = sw.sa4) as thisYr02from salaryWages swgroup by sa4order by sa4
I
Global Structured Query Language Server Transformation Market Industry Best...
statsndata.org
excel, pdf
Updated May 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Structured Query Language Server Transformation Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/structured-query-language-server-transformation-market-78857
Explore at:
excel, pdfAvailable download formats
Dataset updated
May 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Structured Query Language (SQL) Server Transformation market is an integral segment of the data management industry, playing a crucial role in the integration, transformation, and processing of data. Widely employed by businesses across various sectors, SQL Server Transformation involves the manipulation of larg
30 Short Tips for Your Data Scientist Interview
kaggle.com
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skillslash17 (2023). 30 Short Tips for Your Data Scientist Interview [Dataset]. https://www.kaggle.com/datasets/skillslash17/30-short-tips-for-your-data-scientist-interview
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Skillslash17
Description
If you’re a data scientist looking to get ahead in the ever-changing world of data science, you know that job interviews are a crucial part of your career. But getting a job as a data scientist is not just about being tech-savvy, it’s also about having the right skillset, being able to solve problems, and having good communication skills. With competition heating up, it’s important to stand out and make a good impression on potential employers.

Data Science has become an essential part of the contemporary business environment, enabling decision-making in a variety of industries. Consequently, organizations are increasingly looking for individuals who can utilize the power of data to generate new ideas and expand their operations. However these roles come with a high level of expectation, requiring applicants to possess a comprehensive knowledge of data analytics and machine learning, as well as the capacity to turn their discoveries into practical solutions.

With so many job seekers out there, it’s super important to be prepared and confident for your interview as a data scientist.

Here are 30 tips to help you get the most out of your interview and land the job you want. No matter if you’re just starting out or have been in the field for a while, these tips will help you make the most of your interview and set you up for success.

Technical Preparation

Qualifying for a job as a data scientist needs a comprehensive level of technical preparation. Job seekers are often required to demonstrate their technical skills in order to show their ability to effectively fulfill the duties of the role. Here are a selection of key tips for technical proficiency:

1 Master the Basics

Make sure you have a good understanding of statistics, math, and programming languages such as Python and R.

2 Understand Machine Learning

Gain an in-depth understanding of commonly used machine learning techniques, including linear regression and decision trees, as well as neural networks.

3 Data Manipulation

Make sure you're good with data tools like Pandas and Matplotlib, as well as data visualization tools like Seaborn.

4 SQL Skills

Gain proficiency in the use of SQL language to extract and process data from databases.

5 Feature Engineering

Understand and know the importance of feature engineering and how to create meaningful features from raw data.

6 Model Evaluation

Learn to assess and compare machine learning models using metrics like accuracy, precision, recall, and F1-score.

7 Big Data Technologies

If the job requires it, become familiar with big data technologies like Hadoop and Spark.

8 Coding Challenges

Practice coding challenges related to data manipulation and machine learning on platforms like LeetCode and Kaggle.

Portfolio and Projects

9 Build a Portfolio

Develop a portfolio of your data science projects that outlines your methodology, the resources you have employed, and the results achieved.

10 Kaggle Competitions

Participate in Kaggle competitions to gain real-world experience and showcase your problem-solving skills.

11 Open Source Contributions

Contribute to open-source data science projects to demonstrate your collaboration and coding abilities.

12 GitHub Profile

Maintain a well-organized GitHub profile with clean code and clear project documentation.

Domain Knowledge

13 Understand the Industry

Research the industry you’re applying to and understand its specific data challenges and opportunities.

14 Company Research

Study the company you’re interviewing with to tailor your responses and show your genuine interest.

Soft Skills

15 Communication

Practice explaining complex concepts in simple terms. Data Scientists often need to communicate findings to non-technical stakeholders.

16 Problem-Solving

Focus on your problem-solving abilities and how you approach complex challenges.

17 Adaptability

Highlight your ability to adapt to new technologies and techniques as the field of data science evolves.

Interview Etiquette

18 Professional Appearance

Dress and present yourself in a professional manner, whether the interview is in person or remote.

19 Punctuality

Be on time for the interview, whether it’s virtual or in person.

20 Body Language

Maintain good posture and eye contact during the interview. Smile and exhibit confidence.

21 Active Listening

Pay close attention to the interviewer's questions and answer them directly.

Behavioral Questions

22 STAR Method

Use the STAR (Situation, Task, Action, Result) method to structure your responses to behavioral questions.

23 Conflict Resolution

Be prepared to discuss how you have handled conflicts or challenging situations in previous roles.

24 Teamwork

Highlight instances where you’ve worked effectively in cross-functional teams...

Facebook

Twitter

Click to copy link

Link copied

Cite

Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda (2020). SQL Databases for Students and Educators [Dataset]. http://doi.org/10.5281/zenodo.4136985

SQL Databases for Students and Educators

Explore at:

bin, htmlAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4136985

Dataset updated

Oct 28, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mauricio Vargas Sepúlveda; Mauricio Vargas Sepúlveda

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.

I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).

Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.

Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.

Clear search

Close search

Google apps

Main menu

SQL Databases for Students and Educators

WikiSQL Dataset

Database Management Software Report

Additional file 1: of Examining database persistence of ISO/EN 13606...

Definitions of incidence and prevalence terms.

Purchase Order Data

Hospital Management Dataset

Available functions in rEHR.

Opening the Valve on Pure Data Dataset

Database Monitoring Software Report

US Business Dataset - Multiple Categories

Data Modeling Tool Report

Analysis code & Data for the combined Cogcarsim studies 2017+2019

Analytic Engineer

Context:

Objective:

Dataset Overview:

Table Descriptions

Database DevOps Software Report

US Hate Crimes Data 1991-2019 (csv)

FBI hate crime statistics.

Practice dataset for:

Regional YouTube Viral Content Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Australian Employee Salary/Wages DATAbase by detailed occupation, location...

Global Structured Query Language Server Transformation Market Industry Best...

30 Short Tips for Your Data Scientist Interview

1 Master the Basics

2 Understand Machine Learning

3 Data Manipulation

4 SQL Skills

5 Feature Engineering

6 Model Evaluation

7 Big Data Technologies

8 Coding Challenges

9 Build a Portfolio

10 Kaggle Competitions

11 Open Source Contributions

12 GitHub Profile

13 Understand the Industry

14 Company Research

15 Communication

16 Problem-Solving

17 Adaptability

18 Professional Appearance

19 Punctuality

20 Body Language

21 Active Listening

22 STAR Method

23 Conflict Resolution

24 Teamwork

SQL Databases for Students and Educators