22 datasets found
  1. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  2. D

    Data Migration Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Migration Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/data-migration-tool-59372
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data migration tool market is experiencing robust growth, driven by the increasing volume of data generated across various industries and the rising need for efficient and secure data transfer between systems. The market size in 2025 is estimated at $15 billion, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This growth is fueled by several key factors, including the rising adoption of cloud computing, the increasing demand for data analytics and business intelligence, and the growing need for data modernization initiatives across enterprises. The diverse range of applications across healthcare, retail, finance, and manufacturing further contributes to the market's expansion. The market is segmented by deployment type (on-premises, self-scripted, cloud-based) and application, with cloud-based solutions gaining significant traction due to their scalability, flexibility, and cost-effectiveness. Leading vendors such as AWS, Microsoft Azure, IBM, and Informatica are actively driving innovation within the market, fostering competitive landscapes and continuous product enhancements. Significant trends shaping the market include the increasing adoption of automation in data migration processes, the rise of AI-powered data migration tools, and a growing focus on data security and compliance. While challenges like data integration complexity, cost of implementation, and potential data loss during migration persist, the overwhelming benefits of streamlined data management are driving market adoption. The forecast period anticipates continued market expansion as businesses increasingly leverage data migration tools to enhance operational efficiency, gain competitive advantage through data-driven insights, and ensure data integrity and security across their evolving IT infrastructures. The North American region is currently the leading market, followed by Europe and Asia-Pacific, with all regions demonstrating substantial growth potential in the coming years.

  3. d

    Data from: Improper data practices erode the quality of global ecological...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton (2025). Improper data practices erode the quality of global ecological databases and impede the progress of ecological research [Dataset]. http://doi.org/10.5061/dryad.wdbrv15w1
    Explore at:
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Steven Augustine; Isaac Bailey-Marren; Katherine Charton; Nathan Kiel; Michael Peyton
    Time period covered
    Jan 1, 2023
    Description

    The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.

    Missing data codes: NA and N/A

  4. f

    Data from: Correlated RNN Framework to Quickly Generate Molecules with...

    • acs.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuan Li; Chenghui Wang; Ming Sun; Yan Zeng; Yuan Yuan; Qiaolin Gou; Guangchuan Wang; Yanzhi Guo; Xuemei Pu (2023). Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime [Dataset]. http://doi.org/10.1021/acs.jcim.2c00997.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Chuan Li; Chenghui Wang; Ming Sun; Yan Zeng; Yuan Yuan; Qiaolin Gou; Guangchuan Wang; Yanzhi Guo; Xuemei Pu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.

  5. Big Data Driven Architecture for Real Time Systemwide Safety Assurance,...

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Big Data Driven Architecture for Real Time Systemwide Safety Assurance, Phase I [Dataset]. https://data.nasa.gov/dataset/Big-Data-Driven-Architecture-for-Real-Time-Systemw/muws-ntpg
    Explore at:
    csv, application/rssxml, json, tsv, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    NASA has the aim of researching aviation Real-time System-wide Safety Assurance (RSSA) with a focus on the development of prognostic decision support tools as one of its new aeronautics research pillars. The vision of RSSA is to accelerate the discovery of previously unknown safety threats in real time and enable rapid mitigation of safety risks through analysis of massive amounts of aviation data. Our innovation supports this vision by designing a hybrid architecture combining traditional database technology and real-time streaming analytics in a Big Data environment. The innovation includes three major components: a Batch Processing framework, Traditional Databases and Streaming Analytics. It addresses at least three major needs within the aviation safety community. First, the innovation supports the creation of future data-driven safety prognostic decision support tools that must pull data from heterogeneous data sources and seamlessly combine them to be effective for NAS stakeholders. Second, our innovation opens up the possibility to provide real-time NAS performance analytics desired by key aviation stakeholders. Third, our proposed architecture provides a mechanism for safety risk accuracy evaluations. To accomplish this innovation, we have three technical objectives and related work plan efforts. The first objective is the determination of the system and functional requirements. We identify the system and functional requirements from aviation safety stakeholders for a set of use cases by investigating how they would use the system and what data processing functions they need to support their decisions. The second objective is to create a Big Data technology-driven architecture. Here we explore and identify the best technologies for the components in the system including Big Data processing and architectural techniques adapted for aviation data applications. Finally, our third objective is the development and demonstration of a proof-of-concept.

  6. f

    Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karel Diéguez-Santana; Gerardo M. Casañola-Martin; Roldan Torres; Bakhtiyor Rasulev; James R. Green; Humbert González-Díaz (2023). Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds [Dataset]. http://doi.org/10.1021/acs.molpharmaceut.2c00029.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Karel Diéguez-Santana; Gerardo M. Casañola-Martin; Roldan Torres; Bakhtiyor Rasulev; James R. Green; Humbert González-Díaz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains

    155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.

  7. f

    Identifiers for the 21st century: How to design, provision, and reuse...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie A. McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal K. Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon C. Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; Maria Jesus Martin; Johanna R. McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone; Murat Sariyar; Jacky L. Snoep; Stian Soiland-Reyes; Natalie J. Stanford; Neil Swainston; Nicole Washington; Alan R. Williams; Sarala M. Wimalaratne; Lilly M. Winfree; Katherine Wolstencroft; Carole Goble; Christopher J. Mungall; Melissa A. Haendel; Helen Parkinson (2023). Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data [Dataset]. http://doi.org/10.1371/journal.pbio.2001414
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Julie A. McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal K. Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon C. Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; Maria Jesus Martin; Johanna R. McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone; Murat Sariyar; Jacky L. Snoep; Stian Soiland-Reyes; Natalie J. Stanford; Neil Swainston; Nicole Washington; Alan R. Williams; Sarala M. Wimalaratne; Lilly M. Winfree; Katherine Wolstencroft; Carole Goble; Christopher J. Mungall; Melissa A. Haendel; Helen Parkinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

  8. Leading countries by number of data centers 2025

    • statista.com
    • ai-chatbox.pro
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading countries by number of data centers 2025 [Dataset]. https://www.statista.com/statistics/1228433/data-centers-worldwide-by-country/
    Explore at:
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    Worldwide
    Description

    As of March 2025, there were a reported 5,426 data centers in the United States, the most of any country worldwide. A further 529 were located in Germany, while 523 were located in the United Kingdom. What is a data center? A data center is a network of computing and storage resources that enables the delivery of shared software applications and data. These facilities can house large amounts of critical and important data, and therefore are vital to the daily functions of companies and consumers alike. As a result, whether it is a cloud, colocation, or managed service, data center real estate will have increasing importance worldwide. Hyperscale data centers In the past, data centers were highly controlled physical infrastructures, but the cloud has since changed that model. A cloud data service is a remote version of a data center – located somewhere away from a company's physical premises. Cloud IT infrastructure spending has grown and is forecast to rise further in the coming years. The evolution of technology, along with the rapid growth in demand for data across the globe, is largely driven by the leading hyperscale data center providers.

  9. MySQL Training Service Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). MySQL Training Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-mysql-training-service-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    MySQL Training Service Market Outlook



    The global MySQL Training Service market size was valued at USD 1.2 billion in 2023 and is projected to reach USD 2.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 9.2% during the forecast period. The substantial growth in this market can be attributed to the increasing demand for database management skills across various industries. As organizations increasingly rely on data-driven decision-making, the need for skilled professionals who can handle and manipulate MySQL databases has become more critical, driving the demand for specialized training services.



    One of the primary growth factors for the MySQL Training Service market is the rapid digital transformation across industries. Enterprises are increasingly adopting digital technologies to enhance operational efficiency, improve customer experience, and gain a competitive edge. This digital shift necessitates a strong foundation in database management, propelling the demand for MySQL training services. Additionally, the proliferation of big data analytics, cloud computing, and Internet of Things (IoT) technologies has further accentuated the need for proficient MySQL professionals.



    Another significant driver is the widespread adoption of MySQL as a preferred database management system. Known for its reliability, scalability, and open-source nature, MySQL has become a staple in various industry verticals, including IT and telecommunications, BFSI, healthcare, retail, and manufacturing. As more organizations integrate MySQL into their IT infrastructure, the demand for training services to upskill employees and ensure optimal database performance has surged. This trend is particularly prominent among enterprises that prioritize cost-effective and efficient database solutions.



    The increasing emphasis on data security and compliance also plays a crucial role in the market's growth. With stringent regulatory requirements and the rising threat of cyberattacks, organizations are keen on equipping their workforce with the necessary skills to secure and manage their databases effectively. MySQL training services offer specialized courses that cover security best practices, data encryption, and compliance frameworks, thereby addressing a critical need in the market. This focus on security and compliance is expected to drive sustained demand for MySQL training services in the coming years.



    From a regional perspective, North America holds a significant share of the MySQL Training Service market, owing to the high concentration of technology companies and the early adoption of digital technologies. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period. This growth can be attributed to the rapid economic development in countries like India and China, the increasing penetration of internet services, and the expanding IT industry. The growing number of startups and small and medium-sized enterprises (SMEs) in the region also contribute to the burgeoning demand for MySQL training services.



    Training Type Analysis



    In terms of training type, the MySQL Training Service market is segmented into online training, classroom training, and corporate training. Online training has gained significant traction in recent years, driven by the convenience and flexibility it offers. With the rise of e-learning platforms and the increasing availability of high-speed internet, professionals can now access MySQL training modules from the comfort of their homes or offices. This mode of training is particularly popular among working professionals who seek to upskill without disrupting their work schedule. Additionally, online training often comes with interactive features like live sessions, discussion forums, and virtual labs, enhancing the learning experience.



    Classroom training, on the other hand, continues to be a preferred choice for individuals who benefit from face-to-face interactions with instructors and peers. This traditional mode of training is particularly effective for hands-on learning, where participants can engage in real-time problem-solving and receive immediate feedback. Classroom training programs are commonly offered by academic institutions, training centers, and specialized boot camps. Despite the growing popularity of online training, classroom training remains relevant due to its structured approach and the personal touch it provides.



    Corporate training is another critical segment in the MySQL Training Service market. Enterprises often invest in corporate training p

  10. f

    DataSheet_2_Artificial Intelligence Combined With Big Data to Predict Lymph...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liwei Wei; Yongdi Huang; Zheng Chen; Hongyu Lei; Xiaoping Qin; Lihong Cui; Yumin Zhuo (2023). DataSheet_2_Artificial Intelligence Combined With Big Data to Predict Lymph Node Involvement in Prostate Cancer: A Population-Based Study.xlsx [Dataset]. http://doi.org/10.3389/fonc.2021.763381.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Liwei Wei; Yongdi Huang; Zheng Chen; Hongyu Lei; Xiaoping Qin; Lihong Cui; Yumin Zhuo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundA more accurate preoperative prediction of lymph node involvement (LNI) in prostate cancer (PCa) would improve clinical treatment and follow-up strategies of this disease. We developed a predictive model based on machine learning (ML) combined with big data to achieve this.MethodsClinicopathological characteristics of 2,884 PCa patients who underwent extended pelvic lymph node dissection (ePLND) were collected from the U.S. National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2015. Eight variables were included to establish an ML model. Model performance was evaluated by the receiver operating characteristic (ROC) curves and calibration plots for predictive accuracy. Decision curve analysis (DCA) and cutoff values were obtained to estimate its clinical utility.ResultsThree hundred and forty-four (11.9%) patients were identified with LNI. The five most important factors were the Gleason score, T stage of disease, percentage of positive cores, tumor size, and prostate-specific antigen levels with 158, 137, 128, 113, and 88 points, respectively. The XGBoost (XGB) model showed the best predictive performance and had the highest net benefit when compared with the other algorithms, achieving an area under the curve of 0.883. With a 5%~20% cutoff value, the XGB model performed best in reducing omissions and avoiding overtreatment of patients when dealing with LNI. This model also had a lower false-negative rate and a higher percentage of ePLND was avoided. In addition, DCA showed it has the highest net benefit across the whole range of threshold probabilities.ConclusionsWe established an ML model based on big data for predicting LNI in PCa, and it could lead to a reduction of approximately 50% of ePLND cases. In addition, only ≤3% of patients were misdiagnosed with a cutoff value ranging from 5% to 20%. This promising study warrants further validation by using a larger prospective dataset.

  11. f

    S1 Data -

    • plos.figshare.com
    xlsx
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxiang Zhang; Anhang Chen; Linzhen Li; Huiqin Zhang (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0284148.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yuxiang Zhang; Anhang Chen; Linzhen Li; Huiqin Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Owing to the increasingly complex economic environment and difficult employment situation, a large number of new occupations have emerged in China, leading to job diversification. Currently, the overall development status of new occupations in China and the structural characteristics of new occupation practitioners in different cities are still unclear. This study first constructed a development index system for new occupation practitioners from five dimensions (group size, cultural appreciation, salary level, occupation perception, and environmental perception). Relevant data to compare and analyze the development status of new occupation practitioners were derived from the big data mining of China’s mainstream recruitment platforms and the questionnaire survey of new professional practitioners which from four first-tier cities and 15 new first-tier cities in China. The results show that the development level of new occupation practitioners in the four first-tier cities is the highest, and the two new first-tier cities, Chengdu and Hangzhou, have outstanding performance. The cities with the best development level of new occupation practitioners in Eastern, Central, and Western China are Shanghai, Wuhan, and Chengdu, respectively. Most new occupation practitioners in China are confident about the future of their careers. However, more than half of the 19 cities are uncoordinated in the five dimensions of the development of new occupation practitioners, especially those cities with middle development levels. A good policy environment and social environment have not yet been formulated to ensure the sustainable development of new occupation practitioners. Finally, we proposed the following countermeasures and suggestions: (1) Establish a classified database of new occupation talents. (2) Implement a talent industry agglomeration strategy. (3) Pay attention to the coordinated development of new occupation practitioners in cities.

  12. p

    Taiwan Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Taiwan Number Dataset [Dataset]. https://listtodata.com/taiwan-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Taiwan
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Taiwan number dataset will help you generate sales leads. First of all, people can start text with product info and descriptions and send buyers through this dataset. In fact, driving a telemarketing campaign is required at present. Moreover, you can literally call and message with the help of this Taiwan number dataset. Also, the Taiwan number dataset is crucial to let your audience know of the features and uses of your product. Above all, by doing this people can easily increase their marketing area. Even, they can create a bond with tier client and gain their trust with this mobile cell phone number list. Taiwan phone data has the potential to get valuable customers. A businessman will be able to earn more money without spending too much on ads. The SMS marketing plan is the best option, that possible to run promotions cheaply here. So, take the contact number directory at an affordable cost and try it for your help. Taiwan phone data will sustain your telemarketing with useful details. On the other hand, if anyone needs to reach someone as soon as possible, then the phone number is the best choice. Besides, you can directly send messages to their inbox through these datasets. Therefore, the numbers on our Taiwan phone data will aid your marketing efforts greatly. Overall, you can use List To Data for your product publicity so that you can find curious buyers among them. Taiwan phone number list is a top-notch mobile database. Likewise, the List To Data website is obstinate about giving our clients the best service for their money. Mainly, we have organized a 24/7 active support group to ensure that. You can ask them anything about this package, or even bring 95% real samples of the lead from them. Both your branding and sales will be enhanced with this Taiwan phone number list. Hence, make a good conclusion for your business and collect this lead right now. Further, the Taiwan phone number list will let you continue to promote any products all across the country. The user count of these platforms is so big that even that provides you with such a big customer base. Clearly, this will surely raise the possibility of finding interested customers for your benefit.

  13. d

    B2B Leads Database | 500M+ B2B Contact Profiles | 100M+ B2B Mobile Numbers |...

    • datarade.ai
    .csv, .xls
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lead for Business (2022). B2B Leads Database | 500M+ B2B Contact Profiles | 100M+ B2B Mobile Numbers | 100% Real-Time Verified Contact Data [Dataset]. https://datarade.ai/data-products/b2b-leads-database-b2b-contact-database-b2b-contact-direc-lead-for-business
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Feb 24, 2022
    Dataset authored and provided by
    Lead for Business
    Area covered
    Jersey, Trinidad and Tobago, Finland, Mozambique, Palestine, Isle of Man, Armenia, Martinique, South Sudan, Northern Mariana Islands
    Description

    • 500M B2B Contacts • 35M Companies • 20+ Data Points to Filter Your Leads • 100M+ Contact Direct Dial and Mobile Number • Lifetime Support Until You 100% Satisfied

    We are the Best b2b database providers for high-performance sales teams. If you get a fake by any chance, you have nothing to do with them. Nothing is more frustrating than receiving useless data for which you have paid money.

    Every 15 days, our devoted team updates our b2b leads database. In addition, we are always available to assist our clients with whatever data they are working with in order to ensure that our service meets their needs. We keep an eye on our b2b contact database to keep you informed and provide any assistance you require.

    With our simple-to-use system and up-to-date B2B contact list, we hope to make your job easier. You’ll be able to filter your data at Lfbbd based on the industry you work in. For example, you can choose from real estate companies or just simply tap into the healthcare business. Our database is updated on a regular basis, and you will receive contact information as soon as possible.

    Use our information to quickly locate new business clients, competitors, and suppliers. We’ve got your back, no matter what precise requirements you have.

    We have over 500 million business-to-business contacts that you may segment based on your marketing and commercial goals. We don’t stop there; we’re always gathering leads from the right tool so you can reach out to a big database of your clients without worrying about email constraints.

    Thanks to our database, you may create your own campaign and send as many email or automated messages as you want. We collect the most viable b2b database to help you go a long way, as we seek to increase your business and enhance your sales.

    The majority of our clients choose us since we have competitive costs when compared to others. In this digital era, marketing is more advanced, and customers are less willing to pay more for a service that produces poor results.

    That’s why we’ve devised the most effective b2b database strategy for your company. You can also tailor your database and pricing to meet your specific business requirements.

    • Connect directly with the right decision-makers, using the most accurate database of emails and direct dials. Build a clean prospecting list that you can plug into your sales tools and generate new leads from, right away • Over 500 million business contacts worldwide. • You could filter your targeted leads by 20+ criteria including job title, industry, location, Revenue, Technology, and more. • Find the email addresses of the professionals you want to contact one by one or in bulk.

  14. Top Rated Movies On TMDB

    • kaggle.com
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Gupta (2024). Top Rated Movies On TMDB [Dataset]. https://www.kaggle.com/datasets/aditya8989/top-rated-movies/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    Kaggle
    Authors
    Aditya Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a curated list of the top-rated movies on TMDB (The Movie Database), a popular online movie database known for its comprehensive collection of film data. The dataset includes detailed information about the highest-rated films according to user ratings, focusing on films that have received significant acclaim from viewers.

    This dataset can be helpful to make a movie recommendation model.

  15. E

    Data from: Fort Good Hope

    • erddap.aoos.org
    Updated Dec 31, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Climatic Data Center (NCDC) (2009). Fort Good Hope [Dataset]. https://erddap.aoos.org/erddap/info/boem_ahmd_fort_good_hope/index.html
    Explore at:
    Dataset updated
    Dec 31, 2009
    Dataset provided by
    BOEM Arctic Historical Meteorological Database
    Authors
    NOAA National Climatic Data Center (NCDC)
    Time period covered
    Jan 1, 1979 - Dec 31, 2009
    Area covered
    Variables measured
    z, time, station, latitude, longitude, wind_speed, air_pressure, air_temperature, relative_humidity, wind_from_direction, and 4 more
    Description

    Timeseries data from 'Fort Good Hope' (boem_ahmd_fort_good_hope) cdm_altitude_proxy=z cdm_data_type=TimeSeriesProfile cdm_profile_variables=time cdm_timeseries_variables=station,longitude,latitude contributor_email=feedback@axiomdatascience.com contributor_name=Axiom Data Science contributor_role=processor contributor_role_vocabulary=NERC contributor_url=https://www.axiomdatascience.com Conventions=IOOS-1.2, CF-1.6, ACDD-1.3, NCCSV-1.2 defaultDataQuery=lwe_thickness_of_precipitation_amount_cm_time_sum_over_6_hour,air_temperature,air_pressure_at_mean_sea_level,z,wind_speed,time,relative_humidity,surface_snow_thickness,wind_from_direction,air_pressure,dew_point_temperature&time>=max(time)-3days Easternmost_Easting=-128.65 featureType=TimeSeriesProfile geospatial_lat_max=66.233 geospatial_lat_min=66.233 geospatial_lat_units=degrees_north geospatial_lon_max=-128.65 geospatial_lon_min=-128.65 geospatial_lon_units=degrees_east geospatial_vertical_max=2.0 geospatial_vertical_min=0.0 geospatial_vertical_positive=up geospatial_vertical_units=m history=Downloaded from BOEM Arctic Historical Meteorological Database at id=127236 infoUrl=https://sensors.ioos.us/#metadata/127236/station institution=NOAA National Climatic Data Center (NCDC) naming_authority=com.axiomdatascience Northernmost_Northing=66.233 platform=fixed platform_name=Fort Good Hope platform_vocabulary=http://mmisw.org/ont/ioos/platform processing_level=Level 2 references=https://www.ncdc.noaa.gov/,, sourceUrl=https://www.ncdc.noaa.gov/ Southernmost_Northing=66.233 standard_name_vocabulary=CF Standard Name Table v72 station_id=127236 time_coverage_end=2009-12-31T23:00:00Z time_coverage_start=1979-01-01T16:00:00Z Westernmost_Easting=-128.65

  16. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  17. g

    Meta-Information des Samples der Media-Analyse Daten: IntermediaPlus...

    • search.gesis.org
    • pollux-fid.de
    • +1more
    Updated Jun 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brentel, Inga; Kampes, Céline Fabienne; Jandura, Olaf (2020). Meta-Information des Samples der Media-Analyse Daten: IntermediaPlus (2014-2016) [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2030
    Explore at:
    Dataset updated
    Jun 29, 2020
    Dataset provided by
    GESIS search
    GESIS, Köln
    Authors
    Brentel, Inga; Kampes, Céline Fabienne; Jandura, Olaf
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    Bei dem aufbereiteten Längsschnitt-Datensatzes 2014 bis 2016 handelt es sich um „Big-Data“, weshalb der Gesamtdatensatz nur in Form einer Datenbank (MySQL) verfügbar sein wird. In dieser Datenbank liegt die Information verschiedener Variablen eines Befragten untereinander. Die vorliegende Publikation umfasst eine SQL-Datenbank mit den Meta-Daten des Sample des Gesamtdatensatzes, das einen Ausschnitt der verfügbaren Variablen des Gesamtdatensatzes darstellt und die Struktur der aufbereiteten Daten darlegen soll, und eine Datendokumentation des Samples. Für diesen Zweck beinhaltet das Sample alle Variablen der Soziodemographie, dem Freizeitverhalten, der Zusatzinformation zu einem Befragten und dessen Haushalt sowie den interviewspezifischen Variablen und Gewichte. Lediglich bei den Variablen bezüglich der Mediennutzung des Befragten, handelt es sich um eine kleine Auswahl: Für die Onlinemediennutzung wurden die Variablen aller Gesamtangebote sowie der Einzelangebote der Genre Politik und Digital aufgenommen. Die Mediennutzung von Radio, Print und TV wurde im Sample nicht berücksichtigt, da deren Struktur anhand der veröffentlichten Längsschnittdaten der Media-Analyse MA Radio, MA Pressemedien und MA Intermedia nachvollzogen werden kann.
    Die Datenbank mit den tatsächlichen Befragungsdaten wäre auf Grund der Größe des Datenmaterials bereits im kritischen Bereich der Dateigröße für den normalen Up- und Download. Die tatsächlichen Befragungsergebnisse, die zur Analyse nötig sind, werden dann 2021 in Form des Gesamtdatensatzes der Media-Analyse-Daten: IntermediaPlus (2014-2016) im DBK bei GESIS veröffentlicht werden.

    Die Daten sowie deren Datenaufbereitung sind ein Vorschlag eines Best-Practice Cases für Big-Data Management bzw. den Umgang mit Big-Data in den Sozialwissenschaften und mit sozialwissenschaftlichen Daten. Unter Verwendung der GESIS Software CharmStats, die im Rahmen dieses Projektes um Big-Data Features erweitert wurde, erfolgt die Dokumentation und Herstellung der Transparenz der Harmonisierungsarbeit. Durch ein Python-Skript sowie ein html-Template wurde der Arbeitsprozess um und mit CharmStats zudem stärker automatisiert.

    Der aufbereitete Längsschnitt des Gesamtdatensatzes der MA IntermediaPlus für 2014 bis 2016 wird 2021 in Kooperation mit GESIS herausgegeben werden und den FAIR-Prinzipien (Wilkinson et al. 2016) entsprechend verfügbar gemacht werden. Ziel ist es durch die Harmonisierung der einzelnen Querschnitte die Datenquelle der Media-Analyse, die im Rahmen des Dissertationsprojektes „Angebots- und Publikumsfragmentierung online“ durch Inga Brentel und Céline Fabienne Kampes erfolgt, für Forschung zum sozialen und medialen Wandel in der Bundesrepublik Deutschland zugänglich zu machen.

    Künftige Studiennummer des Gesamtdatensatzes der IndermediaPlus im DBK der GESIS: ZA5769 (Version 1-0-0) und der doi: https://dx.doi.org/10.4232/1.13530

    ****************English Version****************

    The prepared Longitudinal IntermediaPlus dataset 2014 to 2016 is a "big data", which is why the entire dataset will only be available in the form of a database (MySQL). In this database, the information of different variables of a respondent is organized in one column, one below the other. The present publication includes a SQL-Database with the meta data of a sample of the full database, which represents a section of the available variables of the total data set and is intended to show the structure of the prepared data and the data-documentation (codebook) of the sample. For this purpose, the sample contains all variables of sociodemography, free-time activities, additional information on a respondent and his household as well as the interview-specific variables and weights. Only the variables concerning the respondent's media use are a small selection: For online media use, the variables of all overall offerings as well as the individual offerings of the genres politics and digital were included. The media use of radio, print and TV was not included in the sample because its structure can be traced using the published longitudinal data of the media analysis MA Radio, MA Pressemedien and MA Intermedia.
    Due to the size of the datafile, the database with the actual survey data would already be in the critical range of the file size for the common upload and download. The actual survey result...

  18. f

    Customer information database.

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huijun Chen (2023). Customer information database. [Dataset]. http://doi.org/10.1371/journal.pone.0285506.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Huijun Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The technological development in the new economic era has brought challenges to enterprises. Enterprises need to use massive and effective consumption information to provide customers with high-quality customized services. Big data technology has strong mining ability. The relevant theories of computer data mining technology are summarized to optimize the marketing strategy of enterprises. The application of data mining in precision marketing services is analyzed. Extreme Gradient Boosting (XGBoost) has shown strong advantages in machine learning algorithms. In order to help enterprises to analyze customer data quickly and accurately, the characteristics of XGBoost feedback are used to reverse the main factors that can affect customer activation cards, and effective analysis is carried out for these factors. The data obtained from the analysis points out the direction of effective marketing for potential customers to be activated. Finally, the performance of XGBoost is compared with the other three methods. The characteristics that affect the top 7 prediction results are tested for differences. The results show that: (1) the accuracy and recall rate of the proposed model are higher than other algorithms, and the performance is the best. (2) The significance p values of the features included in the test are all less than 0.001. The data shows that there is a very significant difference between the proposed features and the results of activation or not. The contributions of this paper are mainly reflected in two aspects. 1. Four precision marketing strategies based on big data mining are designed to provide scientific support for enterprise decision-making. 2. The improvement of the connection rate and stickiness between enterprises and customers has played a huge driving role in overall customer marketing.

  19. f

    Data from: Machine Learning-Driven Discovery and Database of Cyanobacteria...

    • figshare.com
    xlsx
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Renato Soares; Luísa Azevedo; Vitor Vasconcelos; Diogo Pratas; Sérgio F. Sousa; João Carneiro (2024). Machine Learning-Driven Discovery and Database of Cyanobacteria Bioactive Compounds: A Resource for Therapeutics and Bioremediation [Dataset]. http://doi.org/10.1021/acs.jcim.4c00995.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    ACS Publications
    Authors
    Renato Soares; Luísa Azevedo; Vitor Vasconcelos; Diogo Pratas; Sérgio F. Sousa; João Carneiro
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Cyanobacteria strains have the potential to produce bioactive compounds that can be used in therapeutics and bioremediation. Therefore, compiling all information about these compounds to consider their value as bioresources for industrial and research applications is essential. In this study, a searchable, updated, curated, and downloadable database of cyanobacteria bioactive compounds was designed, along with a machine-learning model to predict the compounds’ targets of newly discovered molecules. A Python programming protocol obtained 3431 cyanobacteria bioactive compounds, 373 unique protein targets, and 3027 molecular descriptors. PaDEL-descriptor, Mordred, and Drugtax software were used to calculate the chemical descriptors for each bioactive compound database record. The biochemical descriptors were then used to determine the most promising protein targets for human therapeutic approaches and environmental bioremediation using the best machine learning (ML) model. The creation of our database, coupled with the integration of computational docking protocols, represents an innovative approach to understanding the potential of cyanobacteria bioactive compounds. This resource, adhering to the findability, accessibility, interoperability, and reuse of digital assets (FAIR) principles, is an excellent tool for pharmaceutical and bioremediation researchers. Moreover, its capacity to facilitate the exploration of specific compounds’ interactions with environmental pollutants is a significant advancement, aligning with the increasing reliance on data science and machine learning to address environmental challenges. This study is a notable step forward in leveraging cyanobacteria for both therapeutic and ecological sustainability.

  20. Spotify Top 50 Tracks 2023

    • kaggle.com
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuka_with_data (2024). Spotify Top 50 Tracks 2023 [Dataset]. https://www.kaggle.com/datasets/yukawithdata/spotify-top-tracks-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Kaggle
    Authors
    yuka_with_data
    Description

    💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

    Dataset Description:

    This dataset compiles the tracks from Spotify's official "Top Tracks of 2023" playlist, showcasing the most popular and influential music of the year according to Spotify's streaming data. It represents a wide range array of genres, artists, and musical styles that have defined the musical landscapes of 2023. Each track in the dataset is detailed with a variety of features, popularity, and metadata. This dataset serves as an excellent resource for music enthusiasts, data analysts, and researchers aiming to explore music trends or develop music recommendation systems based on empirical data.

    Data Collection and Processing:

    Obtaining the Data:

    The data was obtained directly from the Spotify Web API, specifically from the "Top Tracks of 2023" official playlist curated by Spotify. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.

    Data Processing:

    To process and structure the data, I developed Python scripts using data science libraries such as pandas for data manipulation and spotipy for API interactions specifically for Spotify data retrieval.

    Workflow:

    1. Authentification
    2. API Requests
    3. Data Cleaning and Transformation
    4. Saving the Data

    Attribute Descriptions:

    • artist_name: the artist name
    • track_name: the title of the track
    • is_explicit: Indicates whether the track contains explicit content
    • album_release_date: The date when the track was released
    • genres: A list of genres associated with the track's artist(s)
    • danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for dancing based on a combination of musical elements
    • valence: A measure from 0.0 to 1.0 indicating the musical positiveness conveyed by a track
    • energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity
    • loudness: The overall loudness of a track in decibels (dB)
    • acousticness: A measure from 0.0 to 1.0 whether the track is acoustic.
    • instrumentalness: Predicts whether a track contains no vocals
    • liveness: Detects the presence of an audience in the recordings
    • speechiness: Detects the presence of spoken words in a track
    • key: The key the track is in. Integers map to pitches using standard Pitch Class notation.
    • tempo: The overall estimated tempo of a track in beats per minute (BPM)
    • mode: Modality of the track
    • duration_ms: The length of the track in milliseconds
    • time_signature: An estimated overall time signature of a track
    • popularity: A score between 0 and 100, with 100 being the most popular

    Possible Data Projects

    • Trends Analysis
    • Genre Popularity
    • Mood and Music
    • Comparison with other tracks

    Disclaimer and Responsible Use:

    • This dataset, derived from Spotify's "Top Tracks of 2023" playlist, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly and ethically.
    • Users should comply with Spotify's Terms of Service and Developer Policies when using this dataset.
    • The dataset includes music track information such as names and artist details, which are subject to copyright. While the dataset presents this information for analytical purposes, it does not convey any rights to the music itself.
    • Users of the dataset must ensure that their use does not infringe on the rights of copyright holders. Any analysis, distribution, or derivative work should respect the intellectual property rights of all parties and comply with applicable laws.
    • The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for the use of the dataset by others. Users are responsible for ensuring their use of the dataset is legal and ethical.
    • For the most accurate and up-to-date information regarding Spotify's music, playlists, and policies, users are encouraged to refer directly to Spotify's official website. This ensures that users have access to the latest details directly from the source.
    • The creator/maintainer of this dataset is not affiliated with Spotify, any third-party entities, or artists mentioned within the dataset. This project is independent and has not been authorized, sponsored, or otherwise approved by Spotify or any other mentioned entities.

    Contribution

    I encourage users who discover new insights, propose dataset enhancements, or craft analytics that illuminate aspects of the dataset's focus to share their findings with the community. - Kaggle Notebooks: To facilitate sharing and collaboration, users are encouraged to create and share their analyses through Kaggle notebooks. For ease of use, start your notebook by clicking "New Notebook" atop this dataset’s page on K...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Organization logo

Most popular database management systems worldwide 2024

Explore at:
43 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description

As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

Search
Clear search
Close search
Google apps
Main menu