Facebook
TwitterAs of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Database Management System (DBMS) market size reached USD 85.5 billion in 2024, reflecting the sector’s robust expansion across various industries. The market is expected to grow at a CAGR of 11.8% from 2025 to 2033, culminating in a forecasted market size of USD 231.7 billion by 2033. This impressive growth is primarily driven by the escalating volume of data generated by digital transformation initiatives, rising adoption of cloud-based solutions, and the increasing complexity of enterprise data ecosystems.
One of the key growth factors for the Database Management System market is the proliferation of big data analytics and the need for real-time data processing. Organizations across sectors such as BFSI, healthcare, retail, and manufacturing are leveraging advanced DBMS solutions to derive actionable insights from massive datasets. The integration of artificial intelligence and machine learning into database management systems is further enhancing their analytical capabilities, enabling predictive analytics, automated data governance, and anomaly detection. As businesses continue to digitize their operations, the demand for scalable, secure, and high-performance DBMS platforms is expected to surge, fueling market expansion.
Another significant driver is the widespread migration to cloud-based database architectures. Enterprises are increasingly opting for cloud deployment due to its flexibility, cost-effectiveness, and ease of scalability. Cloud-based DBMS solutions allow organizations to manage data across multiple geographies with minimal infrastructure investment, supporting global expansion and remote work trends. The growth of hybrid and multi-cloud environments is also propelling the need for database management systems that can seamlessly integrate and synchronize data across diverse platforms. This shift is compelling vendors to innovate and offer more robust, cloud-native DBMS offerings.
The evolution of database types, particularly the rise of NoSQL and in-memory databases, is transforming the DBMS market landscape. Traditional relational databases are now complemented by NoSQL databases that cater to unstructured and semi-structured data, supporting use cases in IoT, social media, and real-time analytics. In-memory databases, known for their low latency and high throughput, are gaining traction in applications requiring instantaneous data access. This diversification of database technologies is enabling organizations to choose best-fit solutions for their specific needs, contributing to the overall growth and dynamism of the market.
From a regional perspective, North America dominates the Database Management System market due to its advanced IT infrastructure, high cloud adoption rates, and strong presence of major technology providers. However, Asia Pacific is witnessing the fastest growth, driven by rapid digitalization in emerging economies, increasing investments in IT modernization, and the expansion of e-commerce and fintech sectors. Europe, Latin America, and the Middle East & Africa are also experiencing steady growth, supported by regulatory compliance initiatives and the modernization of legacy systems. The global nature of data-driven business models ensures that demand for sophisticated DBMS solutions remains strong across all regions.
The Database Management System market by component is segmented into software and services, each playing a pivotal role in the overall ecosystem. The software segment encompasses various types of DBMS platforms, including relational, NoSQL, and in-memory databases, which form the backbone of enterprise data management strategies. This segment holds the largest market share, driven by continuous innovations in database architectures, enhanced security features, and integration capabilities with emerging technologies such as AI and IoT. Organizations are increasingly investing in advanced DBMS software to manage the growing complexity and volume of data, ensure data integrity, and support mission-critical applications.
On the other hand, the services segment, which includes consulting, implementation, support, and maintenance, is experiencing rapid growth as enterprises seek to optimize their database environments. The complexity of modern database systems necessitates expert
Facebook
TwitterThe scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.
Missing data codes: NA and N/A
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NASA has the aim of researching aviation Real-time System-wide Safety Assurance (RSSA) with a focus on the development of prognostic decision support tools as one of its new aeronautics research pillars. The vision of RSSA is to accelerate the discovery of previously unknown safety threats in real time and enable rapid mitigation of safety risks through analysis of massive amounts of aviation data. Our innovation supports this vision by designing a hybrid architecture combining traditional database technology and real-time streaming analytics in a Big Data environment. The innovation includes three major components: a Batch Processing framework, Traditional Databases and Streaming Analytics. It addresses at least three major needs within the aviation safety community. First, the innovation supports the creation of future data-driven safety prognostic decision support tools that must pull data from heterogeneous data sources and seamlessly combine them to be effective for NAS stakeholders. Second, our innovation opens up the possibility to provide real-time NAS performance analytics desired by key aviation stakeholders. Third, our proposed architecture provides a mechanism for safety risk accuracy evaluations. To accomplish this innovation, we have three technical objectives and related work plan efforts. The first objective is the determination of the system and functional requirements. We identify the system and functional requirements from aviation safety stakeholders for a set of use cases by investigating how they would use the system and what data processing functions they need to support their decisions. The second objective is to create a Big Data technology-driven architecture. Here we explore and identify the best technologies for the components in the system including Big Data processing and architectural techniques adapted for aviation data applications. Finally, our third objective is the development and demonstration of a proof-of-concept.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The booming data migration tool market is projected to reach $15 billion in 2025, growing at a CAGR of 15% through 2033. Explore key trends, drivers, restraints, and leading companies shaping this dynamic sector. Discover insights on cloud-based solutions, data security, and regional market shares in our comprehensive analysis.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundA more accurate preoperative prediction of lymph node involvement (LNI) in prostate cancer (PCa) would improve clinical treatment and follow-up strategies of this disease. We developed a predictive model based on machine learning (ML) combined with big data to achieve this.MethodsClinicopathological characteristics of 2,884 PCa patients who underwent extended pelvic lymph node dissection (ePLND) were collected from the U.S. National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2015. Eight variables were included to establish an ML model. Model performance was evaluated by the receiver operating characteristic (ROC) curves and calibration plots for predictive accuracy. Decision curve analysis (DCA) and cutoff values were obtained to estimate its clinical utility.ResultsThree hundred and forty-four (11.9%) patients were identified with LNI. The five most important factors were the Gleason score, T stage of disease, percentage of positive cores, tumor size, and prostate-specific antigen levels with 158, 137, 128, 113, and 88 points, respectively. The XGBoost (XGB) model showed the best predictive performance and had the highest net benefit when compared with the other algorithms, achieving an area under the curve of 0.883. With a 5%~20% cutoff value, the XGB model performed best in reducing omissions and avoiding overtreatment of patients when dealing with LNI. This model also had a lower false-negative rate and a higher percentage of ePLND was avoided. In addition, DCA showed it has the highest net benefit across the whole range of threshold probabilities.ConclusionsWe established an ML model based on big data for predicting LNI in PCa, and it could lead to a reduction of approximately 50% of ePLND cases. In addition, only ≤3% of patients were misdiagnosed with a cutoff value ranging from 5% to 20%. This promising study warrants further validation by using a larger prospective dataset.
Facebook
TwitterHuman communication abilities have greatly evolved with time. Speech/Text/Images/Videos are the channels we often use to communicate, store/share information."**Text**" is one of the primary modes in formal communication and might continue to be so for quite some time.
I wonder, how many words, a person would Type in his lifetime, when he sends an email/text message or prepare some documents. The count might run into millions. We are accustomed to key-in words, without worrying much about the 'Effort' involved in typing the word. We don't bother much about the origin of the word or the correlation between the meaning and the textual representation. 'Big' is actually smaller than 'Small' just going by the words' length.
I had some questions, which, I thought, could be best answered by analyzing the BIG data we are surrounded with today. Since the data volume growing at such high rates, can we bring about some kind of optimization or restructuring in the word usage, so that, we are benefited in terms of Data storage, transmission, processing. Can scanning more documents, would provide better automated suggestions in email / chats, based on what word usually follows a particular word, and assist in quicker sentence completion.
What set of words, in the available text content globally, if we can identify and condense, would reduce the overall storage space required.
What set of words in the regular usage, email/text/documents, if we condense, would reduce the total effort involved in typing (keying-in the text) and reduce the overall size of the text content, which eventually might lead to lesser transmission time, occupy less storage space, lesser processing time for applications which feed on these data for analysis/decision making.
To answer these, we may have to parse the entire web and almost every email/message/blog post/tweet/machine generated content that is in or will be generated in every Phone/Laptop/Computer/Servers, Data generated by every person/bot. Considering tones of text lying around in databases across the world Webpages/Wikipedia/text archives/Digital libraries, and the multiple versions/copies of these content. Parsing all, would be a humongous task. Fresh data is continually generated from various sources. The plate is never empty, if the data is cooked at a rate than the available processing capability.
Here is an attempt to analyze a tiny chunk of data, to see, if the outcome is significant enough to take a note of, if the finding is generalized and extrapolated to larger databases.
Looking out for a reliable source, I could not think of anything better than the Wikipedia database of Webpages. Wiki articles are available for download as html dumps, for any offline processing. https://dumps.wikimedia.org/other/static_html_dumps/, the dump which I downloaded is a ~40 GB compressed file (that turned in ~208 GB folder containing ~15 million files, upon extraction).
With my newly acquired R skills, I tried to parse the html pages, extract the distinct words with their total count in the page paragraphs.I could consolidate the output from the "first million" of html files out of available 15 million. Attached dataset "WikiWords_FirstMillion.csv" is a Comma Separated file with the list of words and their count. There are two columns - word and count. "word" column contains distinct words as extracted from the paragraphs in the wiki pages and "count" column has the count of occurrence in one million wiki pages. Non-Alphanumeric characters have been removed at the time of text extraction.
Any array of characters separated by space are included in the list of words and the count has been presented as is without any filters. To get better estimates, it should be OK to make suitable assumptions, like considering root words, ignoring words if they appear more specific to Wikipedia pages (Welcome, Wikipedia, Articles, Pages, Edit, Contribution.. ).
Wikimedia, for providing the offline dumps R Community, for the Software/Packages/Blog Posts/Articles/Suggestions and Solution on the Q & A sites
(a) Which of the 24 words from the data set are most eligible to get upgraded as a one letter word. Assuming, it is decided to replace the existing words with the newly designated one-lettered word, to achieve storage efficiency.
(b) Assuming, the word count in the data set is a fair estimate of the composition of the words available in the global text content, (Say we do a "Find" and "Replace" on global text content). If the current big data size is 3 Exabytes (10 ^ 18), and say 30% of i...
Facebook
Twitter• 500M B2B Contacts • 35M Companies • 20+ Data Points to Filter Your Leads • 100M+ Contact Direct Dial and Mobile Number • Lifetime Support Until You 100% Satisfied
We are the Best b2b database providers for high-performance sales teams. If you get a fake by any chance, you have nothing to do with them. Nothing is more frustrating than receiving useless data for which you have paid money.
Every 15 days, our devoted team updates our b2b leads database. In addition, we are always available to assist our clients with whatever data they are working with in order to ensure that our service meets their needs. We keep an eye on our b2b contact database to keep you informed and provide any assistance you require.
With our simple-to-use system and up-to-date B2B contact list, we hope to make your job easier. You’ll be able to filter your data at Lfbbd based on the industry you work in. For example, you can choose from real estate companies or just simply tap into the healthcare business. Our database is updated on a regular basis, and you will receive contact information as soon as possible.
Use our information to quickly locate new business clients, competitors, and suppliers. We’ve got your back, no matter what precise requirements you have.
We have over 500 million business-to-business contacts that you may segment based on your marketing and commercial goals. We don’t stop there; we’re always gathering leads from the right tool so you can reach out to a big database of your clients without worrying about email constraints.
Thanks to our database, you may create your own campaign and send as many email or automated messages as you want. We collect the most viable b2b database to help you go a long way, as we seek to increase your business and enhance your sales.
The majority of our clients choose us since we have competitive costs when compared to others. In this digital era, marketing is more advanced, and customers are less willing to pay more for a service that produces poor results.
That’s why we’ve devised the most effective b2b database strategy for your company. You can also tailor your database and pricing to meet your specific business requirements.
• Connect directly with the right decision-makers, using the most accurate database of emails and direct dials. Build a clean prospecting list that you can plug into your sales tools and generate new leads from, right away • Over 500 million business contacts worldwide. • You could filter your targeted leads by 20+ criteria including job title, industry, location, Revenue, Technology, and more. • Find the email addresses of the professionals you want to contact one by one or in bulk.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains
155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The momentum of Databricks (not yet public) and Snowflake to re-write the reality of data in the Cloud truly is a sight to behold. Snowflake fell in extended trading on Wednesday after the company released third-quarter results that beat estimates but offered light product revenue guidance. However its stock price later bounced back even stronger. Go figure? A lot of hype… SNOW Up 10%, it’s really in a league of its own for growth in the Cloud. With a market cap close to $50 billion (data taken from its stock price in Excel), it’s hard to evaluate the business fundamentals away from the hype. The promise of unified data is very tantalizing indeed, yet when Databricks goes public, I don’t see Snowflake dominating like it has been in recent times. The Company’s platform enables customers to consolidate data into a single source to drive business insights, build data-driven applications and share data. Snowflake said it anticipates product revenue will be between $535 and $540 million in its fourth quarter, short of the $553 million expected by analysts estimates according to StreetAccount. Yet investors don’t seem to mind. Snowflake Inc. Shares have lost about 59.7% since the beginning of the year versus the S&P 500's decline of -17%. The recent low was $119 in June, 2022. You can read their Earnings here. Snowflake Earnings Product revenue of $522.8 million in the third quarter, representing 67% year-over-year growth Remaining performance obligations of $3.0 billion, representing 66% year-over-year growth 7,292 total customers Net revenue retention rate of 165% 287 customers with trailing 12-month product revenue greater than $1 million Revenue for the quarter was $557.0 million, representing 67% year-over-year growth. I can see why Snowflake is so popular though. Snowflake is Wildly Popular Currently Snowflake is wildly popular as one of the best growth stocks, . Snowflake provides an end to end data warehousing solution. There is practically no limit to the amount of databases and warehouses you can create (Ofcourse, you need Snowflake credits for creating and using warehouses). It's a highly scalable solution that adheres to all the data security best practices. I do believe Databricks is the better company in the end, but time will tell. Why would this company have a market cap of $50 billion already? They have 5,000 employees, lose money at a good clip and will experience significant competition in the coming years. Net Loss is Concerning Snowflake may not respond well to the significant slowdown in spending we are likely to see in 2023. It needs to significantly reset to be tempting. It does have 287 customers with trailing 12-month product revenue greater than $1 million which is encouraging. It’s growth in the 45-50% range is still very impressive for a company of its size. After 2023 we’ll have a much better idea of the real momentum of Snowflake. Snowflake reported 34% year-over-year growth in the number of customers, reaching 7,292 in the reported quarter. The company added 28 Forbes Global 2000 customers in the reported quarter. Snowflake signed 14 new customers with $1 million in trailing 12-month product revenues in the reported quarter. The real question is how much will it slow down in 2023. The Data Cloud is still Nascent While it’s appealing to invest in first-movers like Snowflake or later Databricks, how will competition and the Data cloud continue to evolve? It’s fairly hard to predict. Snowflake is not a traditional SaaS model, it’s pay as you go consumption based. It’s not yet clear if this is the right business model for optimal profitability. 93% of revenue is consumption-based Revenue recognized only as consumption occurs In many cases, rollover of unused capacity permitted, generally on the purchase of additional capacity Contract durations increasing along with larger customer commitments Primarily billed annually in advance with some on-demand in arrears If you are uncertain of Snowflake’s growth you can visit their visual Earnings PDF. The TAM of the Data Cloud is big enough for room for a lot of different kinds of companies and competitors. In our opinion, Snowflake is in a prime position to compete with AMD, one of the best value stocks. To read more about other Cloud Computing companies, check out: Apple Fair Value Apple P/E Apple EV/EBITDA Microsoft Fair Value Microsoft P/E Microsoft EV/EBITDA Tesla Fair Value Tesla P/E Tesla EV/EBITDA Amazon Fair Value Amazon P/E Amazon EV/EBITDA Netflix Fair Value Netflix P/E Netflix EV/EBITDA
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Bei dem aufbereiteten Längsschnitt-Datensatzes 2014 bis 2016 handelt es sich um „Big-Data“, weshalb der Gesamtdatensatz nur in Form einer Datenbank (MySQL) verfügbar sein wird. In dieser Datenbank liegt die Information verschiedener Variablen eines Befragten untereinander. Die vorliegende Publikation umfasst eine SQL-Datenbank mit den Meta-Daten des Sample des Gesamtdatensatzes, das einen Ausschnitt der verfügbaren Variablen des Gesamtdatensatzes darstellt und die Struktur der aufbereiteten Daten darlegen soll, und eine Datendokumentation des Samples. Für diesen Zweck beinhaltet das Sample alle Variablen der Soziodemographie, dem Freizeitverhalten, der Zusatzinformation zu einem Befragten und dessen Haushalt sowie den interviewspezifischen Variablen und Gewichte. Lediglich bei den Variablen bezüglich der Mediennutzung des Befragten, handelt es sich um eine kleine Auswahl: Für die Onlinemediennutzung wurden die Variablen aller Gesamtangebote sowie der Einzelangebote der Genre Politik und Digital aufgenommen. Die Mediennutzung von Radio, Print und TV wurde im Sample nicht berücksichtigt, da deren Struktur anhand der veröffentlichten Längsschnittdaten der Media-Analyse MA Radio, MA Pressemedien und MA Intermedia nachvollzogen werden kann.
Die Datenbank mit den tatsächlichen Befragungsdaten wäre auf Grund der Größe des Datenmaterials bereits im kritischen Bereich der Dateigröße für den normalen Up- und Download. Die tatsächlichen Befragungsergebnisse, die zur Analyse nötig sind, werden dann 2021 in Form des Gesamtdatensatzes der Media-Analyse-Daten: IntermediaPlus (2014-2016) im DBK bei GESIS veröffentlicht werden.
Die Daten sowie deren Datenaufbereitung sind ein Vorschlag eines Best-Practice Cases für Big-Data Management bzw. den Umgang mit Big-Data in den Sozialwissenschaften und mit sozialwissenschaftlichen Daten. Unter Verwendung der GESIS Software CharmStats, die im Rahmen dieses Projektes um Big-Data Features erweitert wurde, erfolgt die Dokumentation und Herstellung der Transparenz der Harmonisierungsarbeit. Durch ein Python-Skript sowie ein html-Template wurde der Arbeitsprozess um und mit CharmStats zudem stärker automatisiert.
Der aufbereitete Längsschnitt des Gesamtdatensatzes der MA IntermediaPlus für 2014 bis 2016 wird 2021 in Kooperation mit GESIS herausgegeben werden und den FAIR-Prinzipien (Wilkinson et al. 2016) entsprechend verfügbar gemacht werden. Ziel ist es durch die Harmonisierung der einzelnen Querschnitte die Datenquelle der Media-Analyse, die im Rahmen des Dissertationsprojektes „Angebots- und Publikumsfragmentierung online“ durch Inga Brentel und Céline Fabienne Kampes erfolgt, für Forschung zum sozialen und medialen Wandel in der Bundesrepublik Deutschland zugänglich zu machen.
Künftige Studiennummer des Gesamtdatensatzes der IndermediaPlus im DBK der GESIS: ZA5769 (Version 1-0-0) und der doi: https://dx.doi.org/10.4232/1.13530
****************English Version****************
The prepared Longitudinal IntermediaPlus dataset 2014 to 2016 is a "big data", which is why the entire dataset will only be available in the form of a database (MySQL). In this database, the information of different variables of a respondent is organized in one column, one below the other. The present publication includes a SQL-Database with the meta data of a sample of the full database, which represents a section of the available variables of the total data set and is intended to show the structure of the prepared data and the data-documentation (codebook) of the sample. For this purpose, the sample contains all variables of sociodemography, free-time activities, additional information on a respondent and his household as well as the interview-specific variables and weights. Only the variables concerning the respondent's media use are a small selection: For online media use, the variables of all overall offerings as well as the individual offerings of the genres politics and digital were included. The media use of radio, print and TV was not included in the sample because its structure can be traced using the published longitudinal data of the media analysis MA Radio, MA Pressemedien and MA Intermedia.
Due to the size of the datafile, the database with the actual survey data would already be in the critical range of the file size for the common upload and download. The actual survey result...
Facebook
TwitterHello fellow coders,
This huge dataset contains all the songs in Spotify's Daily Top 200 charts in 35+1 (global) countries around the world for a period of over 3 years (2017-2020).
We are 6 university students who used this database for our Big Data class, hence we already did most (if not all) of the necessary data cleaning. Our research question was to understand the impact of many variables on a songs' popularity and see if there was any significant national difference.
You can find 2 files attached:
"Database to Calculate Popularity" includes all the daily entries (8mln+) for the songs which made it to the top 200 . Among these data, quite intuitively, you will find the same song being in the charts for more than one day. We then created a popularity score, unique for a given song in a given country, which took into account the position in the charts and the days it stayed there.
"Final Database" includes many data for each song. It aggregates the populairty for songs into a single score for each. For each song several variables were retrieved by using Spotify's API (such as artist, country, genre, ...)
The following notes will clarify doubts you may have on data, if you still have some feel free to drop a question in the discussion section!
NB you can see that the 8+mln songs of the first database are reduced to "only" tens of thousands in the other. Why? This is because the POPULARITY score was created by US and aggregated into a single score the whole period a same song stayed in the charts of the same country. The popularity given by Spotify takes into account the time at which data are seen, hence a song which dominated the charts a few years back now scores very low in this parameter. This is why we created our new score which includes the number of days a song stayed in the charts and at which position, adjusted with a modificator to give more weight to top positions.
NB We calculated popularity as follows: we assigned a score from 1 to 200 to each song. #1 Ranked gets 200, #2 ranked gets 199, … , #200 ranked gets 1. we multiplied by a modificator which is 3 for #1 2.2 for the #2 1.7 for the #3 1.3 for #4-10 1 for # 11-50 0.85 for #51-100 0.8 for #101-200 we did it for every day we summed up the daily score for a SAME song in a SAME country (NB same song in different country has different popularity)
NB in the final database the title of a same song is repeated more than once. Why? Because we recorded the songs WITHIN each of the 35+1 countries, hence if a same song became popular in more than one county (which is the case for most songs) it will figure in the charts of both countries. Nonethless, the popularity score of the SAME song could be DIFFERENT in two different countries, as each of countries has its own tastes in music!
NB we used NLP and LDA techniques to assign a Tone, Emotion and Topic to the songs in ENGLISH SPEAKING COUNTRIES only. Most of them were correctly recorded but in some cases the lyrics could not be retrieved hence the data are missing.
Below you can find a description of every variable: -Title: Name of a song -URI: Unique identifier of a song created by Spotify -Country Global and 34 countries where Spotify operates, namely Argentina, Australia, Austria, Belgium, Brazil, Canada, Chile, Colombia, Costa Rica, Denmark, Ecuador, Finland, France, Germany, Great Britain, Indonesia, Ireland, Italy, Mexico, Malaysia, Netherlands, New Zealand, Norway, Peru, Philippines, Poland, Portugal, Singapore, Spain, Sweden, Switzerland, Taiwan, Turkey, USA -Popularity_ The popularity score calculated taking into account both the number of days a song stayed in the Top200 and the position it stayed in every day, weighting more the top positions -Artist: Name of the songs' artist -Album/Single: Whether the song was published as a single or as part of an album or compilation -Genre: The predominant genre of an artist according to Spotify’s classification -Artist_followers: The number of followers the artist has on Spotify on the 5th of November 2020 -Explicit: Whether the song is rated as ‘Parental Advisory Explicit Content’ or not -Album: Name of the album the song belongs to -Release_date: Date on which the song was published -Track_number: The position of the song on its respective album -Track _album: Total songs present in the album -Danceability: How suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable -Energy: It is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy -Key: The estim...
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a curated list of the top-rated movies on TMDB (The Movie Database), a popular online movie database known for its comprehensive collection of film data. The dataset includes detailed information about the highest-rated films according to user ratings, focusing on films that have received significant acclaim from viewers.
This dataset can be helpful to make a movie recommendation model.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains anime images for 231 different anime, with approximately 380 image for each of those anime. Please note that you might need to clean the image directories a bit, since the images might contain merchandise and live-action photos in addition to the actual anime itself.
If you'd like to take a look at the scripts used to make this dataset, you can find them on this GitHub repo.
Feel free to extend it, scrape your own images, etc. etc.
As a big anime fan, I found a lot of anime related datasets on Kaggle. I was however disappointed to find no dataset containing anime specific images for popular anime. Some other great datasets that I've been inspired by include: - Top 250 Anime 2023 - Anime Recommendations Database - Anime Recommendation Database 2020 - Anime Face Dataset - Safebooru - Anime Image Metadata
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Search terms of Health information Exchange Policy, Standards, or implementation challenges in Africa.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAs of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.