36 datasets found
  1. Top SQL databases in software development globally 2015

    • statista.com
    Updated Aug 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2015). Top SQL databases in software development globally 2015 [Dataset]. https://www.statista.com/statistics/627698/worldwide-software-developer-survey-databases-used/
    Explore at:
    Dataset updated
    Aug 15, 2015
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2015
    Area covered
    Worldwide
    Description

    The statistic displays the most popular SQL databases used by software developers worldwide, as of **********. According to the survey, ** percent of software developers were using MySQL, an open-source relational database management system (RDBMS).

  2. i

    Grant Giving Statistics for Jacksonville Sql Server Users Group Inc.

    • instrumentl.com
    Updated Feb 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Grant Giving Statistics for Jacksonville Sql Server Users Group Inc. [Dataset]. https://www.instrumentl.com/990-report/jacksonville-sql-server-users-group-inc
    Explore at:
    Dataset updated
    Feb 25, 2022
    Area covered
    Jacksonville
    Description

    Financial overview and grant giving statistics of Jacksonville Sql Server Users Group Inc.

  3. True Car Listings 2017 Project

    • kaggle.com
    zip
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brent D. Pafford (2021). True Car Listings 2017 Project [Dataset]. https://www.kaggle.com/brentpafford/true-car-listings-2017-project
    Explore at:
    zip(29281351 bytes)Available download formats
    Dataset updated
    May 28, 2021
    Authors
    Brent D. Pafford
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This project is my first database creation. Taking real-life data from TrueCar.com listings, scraped and posted publicly by another Kaggle user, I attempt on my own to create, preprocess, and scrutinize the data, first by building a schema to format a database in PostgreSQL13 and running several queries based on self-designated questions. Using Jupyter Notebook, I then run the data through Python’s pandas and Scikit learn packages for basic regression analysis. Finally, I created a dashboard via Tableau Public for helpful visualizations.

    Content

    The dataset shares all but one added column with its original: Region. The original columns include id, price, year, mileage, city, state, vin, make, and model. The addition of the Region column was a self-assigned SQL task: after the original file was uploaded into SQL, I created a new table "Regions" in the database. This data is used to visualize sales across six regions of the U.S.: Pacific, Rockies, Southwest, Midwest, Southeast, and Northeast. City and State were combined in a new column to see data to unique cities, in cases where cities share the same name with others (e.g. Pasadena, Arlington, etc.).

    PostgreSQL | See my Database Creation Notes here. Python | See my notebook for performing simple analysis. Tableau | A dashboard can be found in my Tableau Public profile.

    Acknowledgements

    The dataset utilizes a .csv file extracted from www.TrueCar.com, scraped by Kaggle user Evan Payne (https://www.kaggle.com/jpayne/852k-used-car-listings/data?select=tc20171021.csv).

  4. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  5. MLB Batting Data (2015-2024)

    • kaggle.com
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josue FernandezC (2025). MLB Batting Data (2015-2024) [Dataset]. https://www.kaggle.com/datasets/josuefernandezc/mlb-hitting-data-2015-2024
    Explore at:
    zip(272240 bytes)Available download formats
    Dataset updated
    Sep 29, 2025
    Authors
    Josue FernandezC
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MLB Batting Stats (2015–2024)

    📝Description

    This dataset contains scraped Major League Baseball (MLB) batting statistics from Baseball Reference for the seasons 2015 through 2024. It was collected using a custom Python scraping script and then cleaned and processed in SQL for use in analytics and machine learning workflows.

    The data provides a rich view of offensive player performance across a decade of MLB history. Each row represents a player’s season, with key batting metrics such as Batting Average (BA), On-Base Percentage (OBP), Slugging (SLG), OPS, RBI, and Games Played (G). This dataset is ideal for sports analytics, predictive modeling, and trend analysis.

    ⚙️Data Collection (Python)

    Data was scraped directly from Baseball Reference using a Python script that:

    • Sent HTTP requests with browser-like headers to avoid request blocking.
    • Parsed HTML tables with pandas.read_html().
    • Added a Year column for each season.
    • Cleaned player names by removing symbols (#, *).
    • Kept summary rows for players who appeared on multiple teams/leagues.
    • Converted numeric fields and filled missing values with zeros.
    • Exported both raw and cleaned CSVs for each year.

    🧹Data Cleaning (SQL)

    • After scraping, the raw batting tables were uploaded into BigQuery and further cleaned:
    • Null values removed – Rows missing key fields (Player, BA, OBP, SLG, OPS, Pos) were excluded.
    • Duplicate records handled – Identified duplicate player–year–league entries and kept only one instance.
    • Minimum playing threshold applied – Players with fewer than 100 at-bats were removed to focus on meaningful season-long contributions.
    • The final cleaned table (cleaned_batting_stats) provides consistent, duplicate-free player summaries suitable for analytics.

    📊Dataset Structure

    Columns include: - Player – Name of the player - Year – Season year - Age – Age during the season - Team – Team code (2TM for multiple teams) - Lg – League (AL, NL, or 2LG) - G – Games played - AB, H, 2B, 3B, HR, RBI – Core batting stats - BA, OBP, SLG, OPS – Rate statistics - Pos – Primary fielding position

    🚀Potential Uses

    • League Trends: Compare batting averages and OPS across seasons.
    • Top Performer Analysis: Identify the best hitters in different eras.
    • Predictive Modeling: Forecast future player stats using regression or ML.
    • Clustering: Group players into offensive archetypes.# ## ## ##
    • Sports Dashboards: Build interactive Tableau/Plotly dashboards for fans and analysts.

    📌Acknowledgments

    Raw data sourced from Baseball Reference .

    Inspired by open baseball datasets and community-driven sports analytics.

  6. Database management system market size worldwide 2017-2021

    • statista.com
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Database management system market size worldwide 2017-2021 [Dataset]. https://www.statista.com/statistics/724611/worldwide-database-market/
    Explore at:
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global database management system (DBMS) market revenue grew to ** billion U.S. dollars in 2020. Cloud DBMS accounted for the majority of the overall market growth, as database systems are migrating to cloud platforms. Database market The database market consists of paid database software such as Oracle and Microsoft SQL Server, as well as free, open-source software options like PostgreSQL and MongolDB. Database Management Systems (DBMSs) provide a platform through which developers can organize, update, and control large databases, with products like Oracle, MySQL, and Microsoft SQL Server being the most widely used in the market. Database management software Knowledge of the programming languages related to these databases is becoming an increasingly important asset for software developers around the world, and database management skills such as MongoDB and Elasticsearch are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  7. I

    Global SQL Query Builders Market Competitive Landscape 2025-2032

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global SQL Query Builders Market Competitive Landscape 2025-2032 [Dataset]. https://www.statsndata.org/report/global-118410
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The SQL Query Builders market has emerged as a pivotal segment in the world of database management and development, catering to the increasing need for efficient data handling across industries. These tools enable developers and analysts to construct SQL queries through user-friendly interfaces, thereby streamlining

  8. S

    Global SQL In-Memory Database Market Strategic Recommendations 2025-2032

    • statsndata.org
    excel, pdf
    Updated Nov 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global SQL In-Memory Database Market Strategic Recommendations 2025-2032 [Dataset]. https://www.statsndata.org/report/sql-in-memory-database-market-48184
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Nov 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The SQL In-Memory Database market has gained significant traction over the past few years, emerging as a critical technology for enterprises seeking to enhance their data processing capabilities. By allowing data to be stored in the main memory rather than traditional disk storage, SQL In-Memory Databases provide hi

  9. SQL code for success rate datasets 2012 to 2013

    • gov.uk
    Updated Feb 25, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skills Funding Agency (2014). SQL code for success rate datasets 2012 to 2013 [Dataset]. https://www.gov.uk/government/statistics/sql-code-for-success-rate-datasets-2012-to-2013
    Explore at:
    Dataset updated
    Feb 25, 2014
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Skills Funding Agency
    Description

    These datasets are for:

    • classroom learning
    • workplace learning
    • apprenticeships

    They are produced from information provided in individualised learner records (ILR).

    This information is provided to aid software developers and providers to understand the success rate dataset production process.

  10. H

    Current Population Survey (CPS)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  11. S

    Global SQL Integrated Development Environments (IDE) Available Market...

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global SQL Integrated Development Environments (IDE) Available Market Segmentation Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/sql-integrated-development-environments-ide-available-market-333487
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The SQL Integrated Development Environments (IDE) market has become a critical component of database management and analytics, facilitating the efficient development, testing, and deployment of database applications. As industries increasingly rely on data-driven decision-making, the demand for robust SQL IDE soluti

  12. I

    Global Non-relational SQL Market Forecast and Trend Analysis 2025-2032

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Non-relational SQL Market Forecast and Trend Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/non-relational-sql-market-151269
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Non-relational SQL market, often referred to as the NoSQL market, has emerged as a pivotal force in the realm of database management, catering to a diverse array of industries that require flexible, scalable, and high-performance data storage solutions. Unlike traditional relational databases, Non-relational SQL

  13. Popularity distribution of database management systems worldwide 2024, by...

    • statista.com
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Popularity distribution of database management systems worldwide 2024, by model [Dataset]. https://www.statista.com/statistics/1131595/worldwide-popularity-database-management-systems-category/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of December 2022, relational database management systems (RDBMS) were the most popular type of DBMS, accounting for a ** percent popularity share. The most popular RDBMS in the world has been reported as Oracle, while MySQL and Microsoft SQL server rounded out the top three.

  14. S

    Global NEWSQL In-Memory Database Market Forecast and Trend Analysis...

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global NEWSQL In-Memory Database Market Forecast and Trend Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/newsql-in-memory-database-market-48183
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The NEWSQL In-Memory Database market is rapidly evolving, providing businesses with the high-speed performance of in-memory processing combined with the strong consistency and reliability typical of traditional SQL databases. As organizations increasingly seek to harness real-time analytics and streamline operations

  15. H

    Area Resource File (ARF)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Area Resource File (ARF) [Dataset]. http://doi.org/10.7910/DVN/8NMSFV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D

  16. 2020 NFL Statistics (Active and Retired Players)

    • kaggle.com
    zip
    Updated Feb 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trevor Youngquist (2021). 2020 NFL Statistics (Active and Retired Players) [Dataset]. https://www.kaggle.com/datasets/trevyoungquist/2020-nfl-stats-active-and-retired-players
    Explore at:
    zip(3930921 bytes)Available download formats
    Dataset updated
    Feb 8, 2021
    Authors
    Trevor Youngquist
    Description

    2020 NFL Stats Web Scrape

    This dataset consists of basic statistics and career statistics provided by the NFL on their official website (http://www.nfl.com) for all players, active and retired.

    Summary

    All of the data was web scraped using Python code, which can be found and downloaded here: https://github.com/ytrevor81/NFL-Stats-Web-Scrape

    Explanation of Data

    Before we go into the specifics, it's important to note in the basic statistics and career statistics CSV files that all players are assigned a 'Player_Id'. This is the same ID used by the official NFL website to identify each player. This is useful in case of, for example, importing these CSV files in a SQL database for an app.

    1. The first main group of stats is the basic stats provided for each player. This data is stored in the CSV file titled Active_Player_Basic_Stats.csv and Retired_Player_Basic_Stats.csv.

    The data pulled for each player in Active_Player_Basic_Stats.csv is as follows: a. Player ID b. Full Name c. Position d. Number e. Current Team f. Height g. Height h. Weight i. Experience j. Age k. College

    The data pulled for each player in Retired_Player_Basic_Stats.csv differs slightly from the previous data set. The data is as follows: a. Player ID b. Full Name c. Position f. Height g. Height h. Weight j. College k. Hall of Fame Status

    1. The second main group of stats gathered for each player are their career statistics. Due to the NFL having a various amount of positions that players occupy, the career statistics are divided into statistics categories. The stats for active players and retired players are structured the same, but are stored in separate CSV files (ActivePlayer_(category)_Stats.csv and RetiredPlayer_(category)_Stats.csv). The following are the career statistics categories and accompanying CSV file names: a. Defensive Stats - ..._Defense_Stats.csv b. Fumbles Stats - ..._Fumbles_Stats.csv c. Kick Returns Stats - ..._KickReturns_Stats.csv d. Field Goal Kicking Stats - ..._Kicking_Stats.csv e. Passing Stats - ..._Passing_Stats.csv f. Punt Returns Stats - ..._PuntReturns_Stats.csv g. Punting Stats - ..._Punting_Stats.csv h. Receiving Stats - ..._Receiving_Stats.csv i. Rushing Stats - ..._Rushing_Stats.csv
  17. Football Data: Competitions, Clubs, Players

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Football Data: Competitions, Clubs, Players [Dataset]. https://www.kaggle.com/datasets/thedevastator/football-data-competitions-clubs-players-statist/suggestions
    Explore at:
    zip(46750084 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    The Devastator
    Description

    Football Data: Competitions, Clubs, Players Statistics

    Weekly Updated Statistics from Top Football Competitions, Clubs, and Players

    By David Cereijo [source]

    About this dataset

    This dataset brings together an extensive and regularly updated collection of structured football data, sourced primarily from Transfermarkt. As a leading resource for football market values and detailed statistics, the dataset's consistent updates offer users the most precise data available.

    Including the details from over 60,000 games spanning several seasons across major global competitions, it provides in-depth insights into every aspect of the game. Users have access to data from above 400 clubs participating in these high-profile competitions. The dataset includes information about clubs' performance metrics and benchmarks.

    Moreover, individual player statistics are also covered extensively for more than 30,000 players that are part of these top notch clubs. This includes detailed attributes like players' physical characteristics (height, primary position), team affiliations (club_id), contract statuses (contract_expires), and their individual performances such as goals scored or assists provided.

    Beyond current valuation details for each player available at a specific point in time; the database maintains historical valuations records as well extending back years.The dataset contains more than 400k market value histories to provide a deep view into how performance affects value over time and different instances like transfers between teams.

    In addition to overall game figures and player specifics; another centerpiece is around 1.2 million records spotlighting specific appearances by players. It supplies fine-grained competition-level performance patterns - including details such as games played by each player (appearance_record_id which is linked to game_id) along with any cards earned during play (yellow_card).

    Each CSV file within this dataset is neatly structured - containing entity-specific information or chronicles along with unique IDs that can be utilized to establish relationships across them all – thereby enabling comprehensive analysis possibilities à la Moneyball.The 'appearances' file exemplifies this organization with its meticulously maintained row-per-appearance layout inclusive of key attributes related to each appearance juxtaposed alongside corresponding IDs (game_id & club_id).

    The entire process involved in creating, curating, and maintaining this dataset has been executed via Python scripts, SQL databases, and managed using Github. The backbone of this dataset creation is a specialized Python-based Transfermarkt web scraper that collects the data from its source followed by meticulously processing all multiple terabytes to prepare it for end-user consumption.

    Finally, in keeping with its dedication toward accessibility and structure - the project also offers guidelines and channels for user interactions. It actively encourages open discussions on GitHub (issues section) based around improvements or bug-fixes that can help evolve the quality of data or aid in new enhancements.

    Overall, this dataset provides an unparalleled option to both casual enthusiasts

    How to use the dataset

    This dataset offers comprehensive football data that can be used for a myriad of analyses and visualizations. For those interested in football, you could examine player performance through the seasons, pinpoint historical trends in player market evaluations, or uncover relationships between games played and yellow cards issued.

    To use this dataset effectively:

    1. Understand which files you need: Given the huge variety of data included in this dataset, pinpointing exactly what files and columns you'll require for your analysis is essential to efficiently use this resource. - If you're looking at individual players' performance throughout a season: the appearances file would be most useful. - If your interest lies in how clubs have performed: the games file will assist you.

    2. Join relevant datasets: Each csv file has unique IDs that can be key indicators to join them together. Keep track of these IDs as they can link games with clubs or players with their appearances.

    3. Note repeated rows: Certain rows may be repeated across different CSV files – for example, an individual player’s appearance might appear once per game they played within a specific season.

    4. Use software tool compatibilities: Load CSVs into common applications like Python's pandas library or R's ggplot2 which support large datasets and provide packages for data manipulation and visualization

    **5.Complex Analysis (O...

  18. I

    Global SSMA Connector Market Competitive Landscape 2025-2032

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global SSMA Connector Market Competitive Landscape 2025-2032 [Dataset]. https://www.statsndata.org/report/global-70945
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The SSMA Connector market has emerged as a critical component in the realm of data integration and management, facilitating seamless connections between various database systems. The SQL Server Migration Assistant (SSMA) Connector is particularly essential for organizations looking to migrate their databases to Micr

  19. Detailed statistics of the BibSQL dataset.

    • plos.figshare.com
    xls
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenyu Wang; Mark Xuefang Zhu; Guo Li; Shanshan Kong (2025). Detailed statistics of the BibSQL dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0334965.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 27, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhenyu Wang; Mark Xuefang Zhu; Guo Li; Shanshan Kong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To overcome the limitations of current bibliographic search systems, such as low semantic precision and inadequate handling of complex queries, this study introduces a novel conversational search framework for the Chinese bibliographic domain. Our approach makes several contributions. We first developed BibSQL, the first Chinese Text-to-SQL dataset for bibliographic metadata. Using this dataset, we built a two-stage conversational system that combines semantic retrieval of relevant question-SQL pairs with in-context SQL generation by large language models (LLMs). To enhance retrieval, we designed SoftSimMatch, a supervised similarity learning model that improves semantic alignment. We further refined SQL generation using a Program-of-Thoughts (PoT) prompting strategy, which guides the LLM to produce more accurate output by first creating Python pseudocode. Experimental results demonstrate the framework’s effectiveness. Retrieval-augmented generation (RAG) significantly boosts performance, achieving up to 96.6% execution accuracy. Our SoftSimMatch-enhanced RAG approach surpasses zero-shot prompting and random example selection in both semantic alignment and SQL accuracy. Ablation studies confirm that the PoT strategy and self-correction mechanism are particularly beneficial under low-resource conditions, increasing one model’s exact matching accuracy from 74.8% to 82.9%. While acknowledging limitations such as potential logic errors in complex queries and reliance on domain-specific knowledge, the proposed framework shows strong generalizability and practical applicability. By uniquely integrating semantic similarity learning, RAG, and PoT prompting, this work establishes a scalable foundation for future intelligent bibliographic retrieval systems and domain-specific Text-to-SQL applications.

  20. Data and tools for studying isograms

    • figshare.com
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Jul 31, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Florian Breit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

    Label Data type Description

    isogramy int The order of isogramy, e.g. "2" is a second order isogram

    length int The length of the word in letters

    word text The actual word/isogram in ASCII

    source_pos text The Part of Speech tag from the original corpus

    count int Token count (total number of occurences)

    vol_count int Volume count (number of different sources which contain the word)

    count_per_million int Token count per million words

    vol_count_as_percent int Volume count as percentage of the total number of volumes

    is_palindrome bool Whether the word is a palindrome (1) or not (0)

    is_tautonym bool Whether the word is a tautonym (1) or not (0)

    The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

    Label

    Data type

    Description

    !total_1grams

    int

    The total number of words in the corpus

    !total_volumes

    int

    The total number of volumes (individual sources) in the corpus

    !total_isograms

    int

    The total number of isograms found in the corpus (before compacting)

    !total_palindromes

    int

    How many of the isograms found are palindromes

    !total_tautonyms

    int

    How many of the isograms found are tautonyms

    The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2015). Top SQL databases in software development globally 2015 [Dataset]. https://www.statista.com/statistics/627698/worldwide-software-developer-survey-databases-used/
Organization logo

Top SQL databases in software development globally 2015

Explore at:
Dataset updated
Aug 15, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 2015
Area covered
Worldwide
Description

The statistic displays the most popular SQL databases used by software developers worldwide, as of **********. According to the survey, ** percent of software developers were using MySQL, an open-source relational database management system (RDBMS).

Search
Clear search
Close search
Google apps
Main menu