Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.
I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).
Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.
Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the Distributed SQL Database as a Service market size reached USD 1.46 billion in 2024, reflecting the rapid adoption of cloud-native, scalable database solutions across industries. The market is projected to grow at a robust CAGR of 28.7% from 2025 to 2033, reaching an estimated USD 13.87 billion by 2033. This remarkable growth is primarily driven by the increasing demand for highly available, globally distributed databases that support mission-critical applications, as well as the surge in digital transformation initiatives worldwide.
The exponential growth of the Distributed SQL Database as a Service market can be attributed to the accelerating shift towards cloud-based infrastructure across enterprises of all sizes. Organizations are increasingly seeking solutions that offer both the consistency and scalability of traditional SQL databases, combined with the elasticity and resilience of distributed architectures. As businesses expand their digital footprints and require real-time data access across geographies, distributed SQL databases provide a compelling value proposition. This is particularly evident in sectors such as BFSI, retail, and telecommunications, where transactional integrity and uptime are paramount. The proliferation of IoT devices, edge computing, and global e-commerce platforms has further amplified the need for databases that can seamlessly handle high volumes of distributed transactions without compromising on performance or reliability.
Another major growth factor is the rising complexity of data management in multi-cloud and hybrid environments. Enterprises are moving away from monolithic, on-premises databases in favor of flexible, cloud-native solutions that can be deployed across public, private, and hybrid clouds. Distributed SQL Database as a Service platforms enable organizations to avoid vendor lock-in, ensure business continuity, and achieve geographic redundancy. The ability to scale horizontally, maintain ACID compliance, and support multi-region deployments is driving adoption among large enterprises and SMEs alike. Furthermore, the integration of advanced analytics, AI/ML capabilities, and automated management features is transforming these platforms into strategic assets for digital-first organizations.
Security, compliance, and data sovereignty concerns are also shaping the market landscape. Distributed SQL Database as a Service providers are investing heavily in robust security frameworks, encryption standards, and regulatory compliance features to address the stringent requirements of industries such as healthcare, government, and financial services. The growing emphasis on data privacy, as well as the need to comply with regional regulations like GDPR and CCPA, is compelling enterprises to adopt solutions that offer granular control over data placement and access. This trend is expected to intensify as organizations prioritize secure, compliant, and resilient database infrastructures to support their evolving business models.
From a regional perspective, North America currently dominates the Distributed SQL Database as a Service market, accounting for more than 42% of global revenue in 2024. The region's leadership is fueled by the presence of major cloud service providers, a mature digital ecosystem, and significant investments in AI, IoT, and big data analytics. However, Asia Pacific is emerging as the fastest-growing market, driven by rapid cloud adoption, expanding digital economies, and government-led digitalization initiatives. Europe also holds a substantial share, supported by strong regulatory frameworks and a focus on data sovereignty. Latin America and the Middle East & Africa are witnessing steady growth, propelled by increasing cloud penetration and the modernization of legacy IT infrastructure.
The Component segment of the Distributed SQL Database as a Service market is bifurcated into Software and Services. The software sub-segment is the backbone of this market, encompassing the core database engines, management consoles, and integration APIs that power distributed SQL platforms. The demand for robust software solutions is being driven by the need for high performance, low-latency data processing, and seamless scalability. Enterprises are increasingly opting for software that supports automated failover, sharding, an
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.
https://i.imgur.com/2Egeb8R.png" alt="" title="a title">
This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.
Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.
In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here
We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.
UserId column in the ForumMessages table has values that do not exist in the Users table.True or False.Total columns.
For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.db_abd_create_tables.sql script.clean_data.py script.
The script does the following steps for each table:
NULL.add_foreign_keys.sql script.Total columns in the database tables. I do that by running the update_totals.sql script.
Facebook
TwitterOR-Trans is a GIS road centerline dataset compiled from numerous sources of data throughout the state. Each dataset is from the road authority responsible for (or assigned data maintenace for) the road data each dataset contains. Data from each dataset is compiled into a statewide dataset that has the best avaialble data from each road authority for their jurisdiction (or assigned data maintenance responsibility). Data is stored in a SQL database and exported in numerous formats. Additional metadata resouce: https://geoportalprod-ordot.msappproxy.net/geoportal/catalog/main/home.page
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 6.08(USD Billion) |
| MARKET SIZE 2025 | 6.91(USD Billion) |
| MARKET SIZE 2035 | 25.0(USD Billion) |
| SEGMENTS COVERED | Deployment Model, Database Type, End User, Operating System, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | Rapid digital transformation, Increased data volume, Rising adoption of microservices, Enhanced scalability requirements, Growing emphasis on data security |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | Databricks, MariaDB, Amazon Web Services, DigitalOcean, Microsoft, MongoDB, Google, Redis Labs, Oracle, FaunaDB, PlanetScale, Confluent, Couchbase, Cockroach Labs, Timescale, IBM |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Scalability across diverse applications, Enhanced security and compliance features, Integration with AI and ML, Multi-cloud strategy adoption, Real-time data processing capabilities |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 13.7% (2025 - 2035) |
Facebook
Twitter
According to our latest research, the global Distributed SQL Database as a Service market size reached USD 1.12 billion in 2024, reflecting robust momentum in cloud-native database adoption. The market is poised for substantial growth, projected to expand at a CAGR of 25.6% from 2025 to 2033. By the end of 2033, the market is expected to achieve a value of approximately USD 8.8 billion. This remarkable growth trajectory is primarily driven by enterprises’ increasing demand for high-availability, scalable, and globally distributed data management solutions, as well as the proliferation of cloud infrastructure and digital transformation initiatives across all major industries.
A key growth factor for the Distributed SQL Database as a Service market is the rapid shift towards cloud-native architectures and microservices-based applications. Enterprises are increasingly realizing the limitations of traditional relational databases in handling globally distributed workloads and mission-critical, real-time transactional data. The need for elastic scalability, continuous availability, and seamless geo-distribution has propelled organizations to adopt distributed SQL databases delivered as a service. This shift is further reinforced by the growing adoption of hybrid and multi-cloud strategies, which require databases capable of operating efficiently across diverse cloud and on-premises environments. As organizations prioritize agility and business continuity, the demand for Distributed SQL Database as a Service is expected to accelerate over the forecast period.
Another significant driver is the surge in data volumes generated by digital business processes, IoT devices, and customer-facing applications. Modern enterprises, especially those in sectors such as BFSI, retail, e-commerce, and telecommunications, require robust data platforms that can process, analyze, and store massive amounts of structured and semi-structured data in real time. Distributed SQL Database as a Service solutions offer horizontal scaling, strong consistency, and automated failover, making them ideal for supporting high-throughput transaction management and analytics workloads. Furthermore, the integration of advanced security features, compliance capabilities, and automated management tools has made these solutions attractive for organizations seeking to reduce operational complexity and total cost of ownership.
The market’s expansion is also fueled by the increasing focus on digital transformation and modernization of legacy IT systems. As enterprises embark on cloud migration journeys, they are leveraging Distributed SQL Database as a Service to modernize their data infrastructure, enhance application performance, and improve customer experiences. The proliferation of SaaS, mobile, and edge computing applications necessitates databases that can operate seamlessly across geographies and deliver low-latency access to data. Additionally, the availability of flexible deployment models, including public, private, and hybrid clouds, allows organizations to tailor their database strategies to meet regulatory, security, and performance requirements. These factors collectively contribute to the sustained growth of the Distributed SQL Database as a Service market.
From a regional perspective, North America continues to dominate the Distributed SQL Database as a Service market, accounting for the largest revenue share in 2024, owing to the early adoption of cloud technologies and the presence of leading technology vendors. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, increased cloud investments, and expanding IT infrastructure in countries such as China, India, and Japan. Europe also demonstrates strong growth potential, supported by stringent data protection regulations and the rising adoption of cloud-based database solutions among enterprises. Latin America and the Middle East & Africa are gradually catching up, with increasing awareness and investments in cloud-native data platforms. The regional landscape is expected to evolve further as organizations worldwide embrace distributed database technologies to gain competitive advantage.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Distributed SQL Database market size reached USD 1.75 billion in 2024, marking a significant milestone in the evolution of enterprise data management. With a robust compound annual growth rate (CAGR) of 27.3% from 2025 to 2033, the market is projected to soar to USD 12.5 billion by 2033. This impressive growth trajectory is primarily fueled by the surging demand for scalable, resilient, and highly available database solutions across diverse sectors, driven by the exponential increase in data volumes and the necessity for real-time analytics in mission-critical applications.
The primary growth factor underpinning the expansion of the Distributed SQL Database market is the escalating requirement for high availability and fault tolerance in enterprise IT environments. Modern organizations are increasingly adopting distributed architectures to ensure uninterrupted business operations, even in the face of hardware failures or network outages. Distributed SQL databases, with their inherent capability to replicate data across multiple nodes and geographies, offer a compelling solution for enterprises seeking to minimize downtime and data loss. This demand is further amplified by the proliferation of cloud-native applications and microservices architectures, where traditional monolithic databases struggle to keep pace with the needs of dynamic, distributed workloads.
Another key driver for the Distributed SQL Database market is the rapid digital transformation initiatives being undertaken across industries such as BFSI, retail, healthcare, and manufacturing. Enterprises are leveraging distributed SQL databases to enable real-time analytics, support omnichannel customer experiences, and meet stringent regulatory requirements for data integrity and security. The increasing adoption of Internet of Things (IoT) devices and edge computing is also generating vast amounts of decentralized data, necessitating distributed database solutions that can seamlessly scale and process information at the edge while maintaining transactional consistency and global visibility.
Moreover, the growing preference for hybrid and multi-cloud strategies is accelerating the adoption of distributed SQL databases. As organizations seek to avoid vendor lock-in and optimize their IT infrastructure for cost, performance, and compliance, distributed SQL databases provide the flexibility to deploy workloads across on-premises, public cloud, and edge environments. This flexibility not only enhances operational agility but also empowers enterprises to respond swiftly to changing business requirements and regulatory landscapes. The ability of distributed SQL databases to offer strong consistency, horizontal scalability, and global data distribution is positioning them as a foundational technology in the era of digital business.
From a regional perspective, North America currently dominates the Distributed SQL Database market, accounting for the largest share in 2024, driven by the presence of leading technology vendors, early adoption of cloud-native solutions, and substantial investments in digital infrastructure. Asia Pacific, however, is emerging as the fastest-growing region, propelled by rapid economic development, expanding digital ecosystems, and increasing adoption of advanced data management solutions in countries such as China, India, and Japan. Europe and Latin America are also witnessing steady growth, supported by digital transformation initiatives and the rising demand for real-time data analytics across various sectors.
The Distributed SQL Database market is segmented by component into Software and Services, with each category playing a vital role in the overall ecosystem. The software segment, encompassing database engines, management tools, and integration platforms, accounted for the lion’s share of the market revenue in 2024. This dominance can be attributed to the continuous innovation in database architectures, improvements in query optimization, and the integration of advanced features such as automated failover, distributed transactions, and real-time analytics. Vendors are focusing on enhancing their software offerings to support a wide array of deployment scenarios, including hybrid cloud, multi-cloud, and edge environments, which is further boosting the demand for robust distributed
Facebook
Twitterhttps://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the Global Distributed SQL Database market size was valued at $1.2 billion in 2024 and is projected to reach $7.8 billion by 2033, expanding at a robust CAGR of 23.1% during the forecast period of 2025–2033. The primary driver fueling this remarkable growth is the escalating demand for highly available, horizontally scalable, and resilient database architectures among enterprises undergoing digital transformation. As organizations increasingly migrate mission-critical workloads to the cloud and require real-time, global data consistency, distributed SQL databases have emerged as a pivotal solution, offering both the scalability of NoSQL systems and the transactional guarantees of traditional relational databases. This convergence of scalability and consistency is proving indispensable in supporting modern application workloads, especially in industries where uptime, performance, and data integrity are non-negotiable.
North America currently commands the largest share of the Distributed SQL Database market, accounting for approximately 38% of the global revenue in 2024. This dominance is underpinned by a mature IT ecosystem, widespread adoption of cloud-native architectures, and a high concentration of technology-forward enterprises across sectors such as BFSI, IT and telecommunications, and retail. The United States, in particular, is home to major distributed SQL database vendors and benefits from a vibrant culture of innovation, robust venture capital activity, and proactive regulatory frameworks that encourage digital infrastructure modernization. Furthermore, North American enterprises are early adopters of hybrid and multi-cloud strategies, which necessitate distributed databases capable of maintaining strong consistency and low latency across diverse environments.
Asia Pacific is poised to be the fastest-growing region in the Distributed SQL Database market with an anticipated CAGR of 27.5% from 2025 to 2033. This rapid growth is driven by surging investments in digital transformation initiatives, especially in China, India, Japan, and Southeast Asia. Enterprises in these economies are actively modernizing their IT infrastructures, with a particular focus on cloud migration, real-time analytics, and omnichannel customer experiences. Government-led smart city projects, expanding fintech ecosystems, and the proliferation of e-commerce platforms are further spurring demand for distributed SQL databases that can handle massive transaction volumes and deliver high availability across geographically dispersed locations. As a result, global and regional vendors are intensifying their presence and partnerships in Asia Pacific to capitalize on this burgeoning opportunity.
Emerging markets in Latin America, the Middle East, and Africa are also witnessing a gradual uptick in distributed SQL database adoption, albeit from a lower base. These regions face unique challenges such as limited IT infrastructure, budget constraints, and a shortage of skilled database professionals. However, localized demand is being catalyzed by the rise of digital banking, regulatory mandates for data sovereignty, and the increasing digitization of public services. Policy reforms aimed at fostering technology adoption and the entry of global cloud service providers are beginning to bridge the digital divide, but market penetration remains uneven. Overcoming barriers such as connectivity issues and legacy system integration will be crucial for unlocking the full potential of distributed SQL databases in these emerging economies.
| Attributes | Details |
| Report Title | Distributed SQL Database Market Research Report 2033 |
| By Component | Software, Services |
| By Deployment Mode | On-Premises, Cloud |
| By Application | Transaction Management, Analytics, D |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
E.g. house number and street name*E.g. city.Description of missing data on variables used for the linkage from the laboratory, case notifications and an example pre-entry screening dataset, by NHS number availability and validity.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Database as a Service (DaaS) platform market is experiencing robust growth, driven by the increasing adoption of cloud computing, the need for scalable and cost-effective database solutions, and the rising demand for real-time data processing. Let's assume, for illustrative purposes, a 2025 market size of $50 billion with a Compound Annual Growth Rate (CAGR) of 15% for the forecast period of 2025-2033. This implies significant expansion, reaching an estimated market value exceeding $150 billion by 2033. This growth is fueled by several key trends including the proliferation of big data analytics, the expanding adoption of serverless architectures, and the growing preference for managed services that reduce operational overhead for businesses. Major players like AWS, Microsoft Azure, Google Cloud Platform, and others are heavily investing in enhancing their DaaS offerings, fostering competition and innovation. However, challenges remain, including security concerns related to data stored in the cloud, vendor lock-in, and the complexity of migrating existing databases to a DaaS environment. The competitive landscape is intensely dynamic, with established tech giants alongside specialized DaaS providers vying for market share. The segmentation of the market is likely based on deployment model (public, private, hybrid), database type (SQL, NoSQL), and industry vertical. Future growth will be influenced by factors such as advancements in database technologies (e.g., graph databases, in-memory databases), increasing adoption of artificial intelligence and machine learning for database management, and the growing demand for data sovereignty and compliance solutions. The market's continued expansion is assured, but the precise trajectory will depend on the evolution of cloud technologies, regulatory changes, and the ability of providers to address security and scalability challenges effectively. This robust growth presents significant opportunities for both established and emerging players within the DaaS landscape.
Facebook
TwitterAs of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset consists of 8,034 entries designed to evaluate the performance of text-to-SQL models. Each entry contains a natural language text query and its corresponding SQL command. The dataset is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge the understanding and generation capabilities of machine learning models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chi squared test, not including missing data for each variable other than NHS number*At least one social risk factor including drug use, homelessness, alcohol misuse/ abuse, prisonDescriptive analysis of case notifications dataset for records with and without an NHS number.
Facebook
TwitterChicago sites that offer free or affordable technology resources and services, like computers with Internet access, Wi-Fi hotspots and technology training. Call or visit the organization's website before going to the location. For more information, visit http://locations.weconnectchicago.org/.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Global In-memory database market is expected to revenue of around USD 36.21 billion by 2032, growing at a CAGR of 19.2% between 2024 and 2032.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Cloud Database MySQL market is experiencing robust growth, driven by the increasing adoption of cloud computing and the inherent scalability and cost-effectiveness of MySQL. The market's substantial size, estimated at $15 billion in 2025, reflects a significant shift towards cloud-based database solutions. This preference is fueled by factors such as reduced infrastructure costs, enhanced agility, and improved data accessibility. Key market drivers include the expanding need for robust and scalable database solutions for applications ranging from e-commerce to enterprise resource planning (ERP). Furthermore, the rising demand for data analytics and business intelligence solutions is further propelling market expansion. The competitive landscape is intensely populated by major players including Microsoft, Amazon Web Services (AWS), Google Cloud, Oracle, and Alibaba Cloud, leading to innovation and a diverse range of offerings. These companies continuously enhance their services with improved performance, security features, and managed services options, catering to a broader customer base. Trends such as serverless databases, the increasing adoption of containerization technologies (like Docker and Kubernetes), and the growth of hybrid cloud deployments are reshaping the market landscape. However, challenges like data security concerns and complexities associated with cloud migration may act as restraints on market growth, though these are being addressed through advanced security measures and streamlined migration processes. Looking ahead, the Cloud Database MySQL market is poised for sustained growth, with a projected Compound Annual Growth Rate (CAGR) of approximately 15% from 2025 to 2033. This growth trajectory is underpinned by the continuing digital transformation across industries and the expanding global adoption of cloud technologies. Segmentation within the market is likely based on deployment model (public, private, hybrid), pricing models, and industry verticals. The substantial market size, coupled with a healthy CAGR, positions Cloud Database MySQL as a highly attractive and strategically important segment within the broader cloud computing market. The continued innovation and competition among major vendors ensures that the market remains dynamic and responsive to evolving user needs.
Facebook
Twitteranalyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Combining human expertise with information from book-consumer digital data may generate what it takes to face the following changes in such a critical market. Along with the publishing industry, researchers rely on book-related data to develop tools and applications, drawing constructive conclusions to make better informed and faster decisions. Such solutions range from best-selling prediction models to natural language processing to classify raw text. Besides require complex Artificial Intelligence (AI) methods, all of them are essentially data-dependent, mainly book-related data-dependent.
Data, and more specifically data growth, is essential for developing and performing such AI-powered applications. None of these efforts can be achieved without a preliminary collection of data on literary works, readers, and their reading habits. Therefore, it is critically important to build and make available datasets that fully comprise the essential elements of the book industry ecosystem. Although some efforts have been made for English language books, little has been done regarding other lesser-spoken languages, such as Portuguese. The evaluation of specific data is of fundamental importance for literature analysis, as Portuguese has its own literary peculiarities. Hence, we present PPORTAL, a Public domain PORTuguese-lAnguage Literature dataset. PPORTAL's contributions are summarized as follows:
Data integration of numerous public domain works from three digital libraries;
Enriched metadata for works, authors and online reviews extracted from Goodreads;
Feature engineering on the metadata to create meaningful additional features; and
Unrestricted access in two formats (SQL database and compressed .csv files
Facebook
TwitterPublic contracts with the City of Bloomington since 2018.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.
I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).
Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.
Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.