Facebook
TwitterThe South Florida Water Management District (SFWMD) and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 174 NOAA Atlas 14 stations in central and south Florida. The change factors were computed as the ratio of projected future to historical extreme precipitation depths fitted to extreme precipitation data from various downscaled climate datasets using a constrained maximum likelihood (CML) approach. The change factors correspond to the period 2050-2089 (centered in the year 2070) as compared to the 1966-2005 historical period. The SFWMD manages the water resources of various interconnected areas in south Florida, which are defined in the SFWMD ArcHydro Enhanced Database (AHED) as “AHED Rain Areas”. The SFWMD is interested in summarizing change factors for each individual AHED Rain Area to use in future planning efforts. Geospatial data provided in an ArcGIS shapefile named “AHED_basins.shp” are described herein. The shapefile contains polygons for the AHED Rain Areas defined in the South Florida Water Management District (SFWMD)'s ArcHydro Enhanced Database (AHED) including their acreages.
Facebook
TwitterInformation on building organisations of private buildings in Hong Kong.
Facebook
Twitterhttps://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
IMOPE is the reference database for buildings at national level. To date and on a daily basis, it supports nearly 20,000 public and private actors and more than 800 territories (in operational context: fight against unworthy housing, fight against vacancy, energy renovation, OPAH-RU, PIG, VOC,...) wanting to know and transform the French building sector.
Resulting from public research conducted at Mines Saint-Etienne (Institut Mines Télécom), this breakthrough innovation, the methods of which have been patented by the Ministry of the Economy, Industrial and Digital Sovereignty, brings together all the data of interest (+ 250 items of information) on each of the 20 million existing buildings.
⁇ Consult the news of the ONB and the national IMOPE database ⁇ ACTU ONB/IMOPE
IMOPE has been co-built, since its creation in 2016, with and for the actors of the territories (ALEC, operators ANAH, ADIL, DDT, ADEME, EPCI, urban planning agencies ...) in order to meet the multiple challenges of the building sector. Issues on which we can cite:energy renovation, combating vacancy, precariousness and unsanitary conditions, attrition of housing, home support, adaptation to climate change, etc.
The sourcing of merged and reprocessed data: A single and multiple sourcing to increase knowledge and merging in particular: - Open Data: BAN, BDTOPO, DVF, DPE (ADEME), consumption data (ENEDIS, GRDF), RPLS, QPV, Georisks, permanent equipment base, SITADEL, socio-economic data (RP, FiLoSoFi, INSEE), OPAH, ... - "Conventional" data: Land files enriched by Cerema (source DGFiP DGALN), LOVAC, non-anonymised data of owners, RNIC (ANAH) - Local or business data: devices, FSL, LHI, orders, procedures, reporting, planning permission, rental permit, ANAH aid, ... - "Enriched" data: Machine Learning and Deep Learning (DVF, DPE, power source and heating type predictions)
A strong commitment to the commons: U.R.B.S, spin-off of Mines Saint-Etienne, maintains, develops and improves on a clean background and since 2019 the IMOPE database. With a view to mutualisation and openness, U.R.B.S. invites the entire building community (architects, public decision-makers, insurers, artisans, diagnosticians, researchers, citizens, design offices, etc.) to disseminate and reuse widely internally as well as externally, natively or with post-processing, the data contained in the IMOPE database.
It is driven by this philosophy of sharing that we have deployed the**National Building Observatory** (ONB). The**ONB** is a citizen geo-common. As a decision-making tool providing knowledge of the building stock, it makes it easier for everyone to access the information contained in the national IMOPE database.
Convinced that together we will go further, the ONB and IMOPE are initiatives led by civil society. Civil society of which we are part and which, we are convinced, is the keystone for achieving the energy, climate and social objectives of the building sector.
⁇ For more information: https://www.urbs.fr ⁇ To contact us: contact@urbs.fr ⁇ To access the ONB: https://app.urbs.fr/onb/connection
⁇ To access the data catalogue, click here
Facebook
TwitterThe Water Quality Portal (WQP) is a cooperative service sponsored by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and the National Water Quality Monitoring Council (NWQMC). It serves data collected by over 400 state, federal, tribal, and local agencies. Water quality data can be downloaded in Excel, CSV, TSV, and KML formats. Fourteen site types are found in the WQP: aggregate groundwater use, aggregate surface water use, atmosphere, estuary, facility, glacier, lake, land, ocean, spring, stream, subsurface, well, and wetland. Water quality characteristic groups include physical conditions, chemical and bacteriological water analyses, chemical analyses of fish tissue, taxon abundance data, toxicity data, habitat assessment scores, and biological index scores, among others. Within these groups, thousands of water quality variables registered in the EPA Substance Registry Service (https://iaspub.epa.gov/sor_internet/registry/substreg/home/overview/home.do) and the Integrated Taxonomic Information System (https://www.itis.gov/) are represented. Across all site types, physical characteristics (e.g., temperature and water level) are the most common water quality result type in the system. The Water Quality Exchange data model (WQX; http://www.exchangenetwork.net/data-exchange/wqx/), initially developed by the Environmental Information Exchange Network, was adapted by EPA to support submission of water quality records to the EPA STORET Data Warehouse [USEPA, 2016], and has subsequently become the standard data model for the WQP. Contributing organizations: ACWI The Advisory Committee on Water Information (ACWI) represents the interests of water information users and professionals in advising the federal government on federal water information programs and their effectiveness in meeting the nation's water information needs. ARS The Agricultural Research Service (ARS) is the U.S. Department of Agriculture's chief in-house scientific research agency, whose job is finding solutions to agricultural problems that affect Americans every day, from field to table. ARS conducts research to develop and transfer solutions to agricultural problems of high national priority and provide information access and dissemination to, among other topics, enhance the natural resource base and the environment. Water quality data from STEWARDS, the primary database for the USDA/ARS Conservation Effects Assessment Project (CEAP) are ingested into WQP via a web service. EPA The Environmental Protection Agency (EPA) gathers and distributes water quality monitoring data collected by states, tribes, watershed groups, other federal agencies, volunteer groups, and universities through the Water Quality Exchange framework in the STORET Warehouse. NWQMC The National Water Quality Monitoring Council (NWQMC) provides a national forum for coordination of comparable and scientifically defensible methods and strategies to improve water quality monitoring, assessment, and reporting. It also promotes partnerships to foster collaboration, advance the science, and improve management within all elements of the water quality monitoring community. USGS The United States Geological Survey (USGS) investigates the occurrence, quantity, quality, distribution, and movement of surface waters and ground waters and disseminates the data to the public, state, and local governments, public and private utilities, and other federal agencies involved with managing the United States' water resources. Resources in this dataset:Resource Title: Website Pointer for Water Quality Portal. File Name: Web Page, url: https://www.waterqualitydata.us/ The Water Quality Portal (WQP) is a cooperative service sponsored by the United States Geological Survey (USGS), the Environmental Protection Agency (EPA), and the National Water Quality Monitoring Council (NWQMC). It serves data collected by over 400 state, federal, tribal, and local agencies. Links to Download Data, User Guide, Contributing Organizations, National coverage by state.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Database Platform as a Service (DBPaaS) market is experiencing robust growth, driven by the increasing adoption of cloud computing, the need for scalable and cost-effective database solutions, and the rising demand for data analytics. The market's expansion is fueled by businesses migrating legacy on-premise databases to cloud-based alternatives, seeking enhanced agility, and leveraging the advantages of pay-as-you-go models. Major players like Amazon Web Services, Microsoft Azure, and Google Cloud Platform dominate the market, offering a wide range of DBPaaS options catering to diverse needs, from relational databases to NoSQL solutions. The market is segmented by deployment model (public cloud, private cloud, hybrid cloud), database type (SQL, NoSQL, NewSQL), and industry vertical (BFSI, healthcare, retail, etc.). Competition is fierce, with established players constantly innovating and new entrants emerging to challenge the status quo. Factors like data security concerns and integration complexities pose some challenges to market growth. However, advancements in serverless computing and the increasing adoption of artificial intelligence (AI) and machine learning (ML) are expected to drive further expansion. The forecast period (2025-2033) is projected to witness substantial growth, driven by ongoing digital transformation initiatives across various industries. The increasing adoption of cloud-native applications and microservices architectures further necessitates robust and scalable DBPaaS solutions. While the initial investment in migrating to the cloud can be significant, the long-term cost savings and improved efficiency make DBPaaS an attractive option. The market's growth is expected to be particularly strong in regions with high cloud adoption rates and robust digital infrastructure. The competitive landscape will likely remain dynamic, with mergers and acquisitions, strategic partnerships, and continuous product innovation shaping the market's trajectory. Overall, the DBPaaS market is poised for substantial growth, driven by a confluence of technological advancements and evolving business needs. Assuming a conservative CAGR of 20% (a reasonable estimate considering the high growth sectors involved), and a 2025 market size of $50 Billion, we can project substantial future growth.
Facebook
Twitterhttps://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global In-Memory Database market size was USD 7.8 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 19.1% from 2024 to 2031. Market Dynamics of In-Memory Database Market
Key Drivers for In-Memory Database Market
Increasing Volume of Data - The exponential growth of data generated by various sources, including social media, IoT devices, and enterprise applications, is another key driver for the IMDB market. Organizations are increasingly seeking efficient ways to manage and analyze this vast amount of data to gain actionable insights and maintain a competitive edge. In-memory databases are well-suited to handle large volumes of data with high throughput, providing the scalability needed to accommodate the growing data influx. The ability to scale horizontally by adding more nodes to the database cluster ensures that IMDBs can meet the demands of data-intensive applications.
The increasing dependence on real-time analytics and decision-making is anticipated to drive the In-Memory Database market's expansion in the years ahead.
Key Restraints for In-Memory Database Market
The amount of available RAM, which can restrict their scalability for very large datasets, limits the In-Memory Database industry growth.
The market also faces significant difficulties related to the high cost of implementation.
Introduction of the In-Memory Database Market
The In-Memory Database market is experiencing robust growth, driven by the need for high-speed data processing and real-time analytics across various industries. In-memory databases store data directly in the main memory (RAM) rather than on traditional disk storage, allowing for significantly faster data retrieval and manipulation. This technology is particularly advantageous for applications requiring rapid transaction processing and real-time data insights, such as financial services, telecommunications, and e-commerce. Despite its benefits, the market faces challenges, including high implementation costs and limitations on data storage capacity due to RAM constraints. Additionally, concerns about data volatility and the need for continuous power supply further complicate adoption. However, advancements in memory technology, declining costs of RAM, and the increasing demand for real-time analytics are driving market growth. As businesses seek to enhance performance and decision-making capabilities, the In-Memory Database market is poised for continued expansion, providing critical solutions for high-performance data management.
Facebook
TwitterGeoreferenced vector database, containing the areas bearing reports of travertine deposits, mainly associated with springs or "Limestone Precipitating Springs", detected at a scale of 1:10,000 in the Emilia-Romagna Apennines and approximated to polygons. In the tabular content, unpublished data (taken from the Author's personal knowledge) are differentiated from those taken from existing databases, such as for example the regional databases of the Geological Map of the Emilia-Romagna Apennines at 1:10,000 scale or of the Habitat map of Emilia-Romagna. The stratigraphic-structural domains to which the geological units within which the reports are found belong are also indicated.
Facebook
TwitterResearch dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.
Facebook
TwitterThe Swedish Contextual Database provides a large number of longitudinal and regional macro-level indicators primarily assembled to facilitate research on the effects of contextual factors on family and fertility behavior. It can be linked to the individual-level data of the Swedish GGS as well as to data of other surveys. It can also be used for other types of research and for teaching. The comparative data will also be integrated into the international Contextual Database of the GGP. The contextual data are available open-access through the GGP webpage: www.ggp-i.org and through the webpage of Stockholm University Demography Unit www.suda.su.se
Purpose:
The Swedish contextual database (CDB) was established to accompany the Swedish Generations and Gender Survey (GGS) and to complement the contextual database of the international Generations and Gender Programme (GGP).
The Swedish Contextual Data Collection is available in xls format. In addition to that, the internationally comparative data will be integrated into the Contextual Database (CDB) of the GGP in 2018. These data can be exported in other formats, as well (e.g. CSV, XML). The indicators can also be accessed in a single file in STATA or SPSS format. The data can be matched with the Swedish GGS. International regional coding schemes are also supported, such as NUTS, OECD.
Facebook
TwitterThis brief presents the latest findings from NYTD surveys completed by youth in NYTD Cohort 2 at ages 17, 19, and 21 (in FY 19). Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterSang Sin Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Database Management System (DBMS) market size reached USD 85.5 billion in 2024, reflecting the sector’s robust expansion across various industries. The market is expected to grow at a CAGR of 11.8% from 2025 to 2033, culminating in a forecasted market size of USD 231.7 billion by 2033. This impressive growth is primarily driven by the escalating volume of data generated by digital transformation initiatives, rising adoption of cloud-based solutions, and the increasing complexity of enterprise data ecosystems.
One of the key growth factors for the Database Management System market is the proliferation of big data analytics and the need for real-time data processing. Organizations across sectors such as BFSI, healthcare, retail, and manufacturing are leveraging advanced DBMS solutions to derive actionable insights from massive datasets. The integration of artificial intelligence and machine learning into database management systems is further enhancing their analytical capabilities, enabling predictive analytics, automated data governance, and anomaly detection. As businesses continue to digitize their operations, the demand for scalable, secure, and high-performance DBMS platforms is expected to surge, fueling market expansion.
Another significant driver is the widespread migration to cloud-based database architectures. Enterprises are increasingly opting for cloud deployment due to its flexibility, cost-effectiveness, and ease of scalability. Cloud-based DBMS solutions allow organizations to manage data across multiple geographies with minimal infrastructure investment, supporting global expansion and remote work trends. The growth of hybrid and multi-cloud environments is also propelling the need for database management systems that can seamlessly integrate and synchronize data across diverse platforms. This shift is compelling vendors to innovate and offer more robust, cloud-native DBMS offerings.
The evolution of database types, particularly the rise of NoSQL and in-memory databases, is transforming the DBMS market landscape. Traditional relational databases are now complemented by NoSQL databases that cater to unstructured and semi-structured data, supporting use cases in IoT, social media, and real-time analytics. In-memory databases, known for their low latency and high throughput, are gaining traction in applications requiring instantaneous data access. This diversification of database technologies is enabling organizations to choose best-fit solutions for their specific needs, contributing to the overall growth and dynamism of the market.
From a regional perspective, North America dominates the Database Management System market due to its advanced IT infrastructure, high cloud adoption rates, and strong presence of major technology providers. However, Asia Pacific is witnessing the fastest growth, driven by rapid digitalization in emerging economies, increasing investments in IT modernization, and the expansion of e-commerce and fintech sectors. Europe, Latin America, and the Middle East & Africa are also experiencing steady growth, supported by regulatory compliance initiatives and the modernization of legacy systems. The global nature of data-driven business models ensures that demand for sophisticated DBMS solutions remains strong across all regions.
The Database Management System market by component is segmented into software and services, each playing a pivotal role in the overall ecosystem. The software segment encompasses various types of DBMS platforms, including relational, NoSQL, and in-memory databases, which form the backbone of enterprise data management strategies. This segment holds the largest market share, driven by continuous innovations in database architectures, enhanced security features, and integration capabilities with emerging technologies such as AI and IoT. Organizations are increasingly investing in advanced DBMS software to manage the growing complexity and volume of data, ensure data integrity, and support mission-critical applications.
On the other hand, the services segment, which includes consulting, implementation, support, and maintenance, is experiencing rapid growth as enterprises seek to optimize their database environments. The complexity of modern database systems necessitates expert
Facebook
TwitterThis data release (version 4.0, February 2021) consists of a Microsoft® Access database and Microsoft® Excel workbook that contain water-level data and other hydrologic information for wells on and near the Nevada Test Site. The three worksheets in the Microsoft® Excel workbook also are provided as individual comma-separated values (CSV) files. The data release supports U.S. Geological Survey Data Series 533 (https://pubs.usgs.gov/ds/533/). The Microsoft® Access database contains water levels measured from 925 wells in and near areas of underground nuclear testing at the Nevada Test Site. The water-level measurements were collected from 1941 to 2020. All water levels in the Microsoft® Access database are stored in the USGS National Water Information System (NWIS) database available at https://waterdata.usgs.gov/nv/nwis. The Microsoft® Access database also provides information for each well (well construction, borehole lithology, units contributing water to the well, and general site remarks) and water-level measurement (measurement source, status, method, accuracy, and specific water-level remarks). Additionally, the database provides hydrograph descriptions (hereinafter hydrograph narratives) that document the water-level history and describe and interpret the water-level hydrograph for each well. Multiple condition flags were assigned to each water‑level measurement to describe the hydrologic conditions at the time of measurement. The condition flags describe the general quality (accuracy), temporal variability, regional significance, and hydrologic conditions of the measurements. The Microsoft® Excel workbook contains hydrographs and locations for the 925 wells, which are interactively presented in the workbook as an interface to the Microsoft® Access database. This workbook is designed to be an easy-to-use tool to obtain the water-level history for any well in the study area. Water-level data can be restricted to certain wells, dates, or hydrologic conditions by using the Microsoft® Excel built-in AutoFilter. Additional information provided in the workbook includes selected well-site information, water-level information, contributing units, the hydrograph narratives, and hyperlinks to the NWISWeb (http://waterdata.usgs.gov/nv/nwis/) site home page for each well. Information presented in the workbook for all water levels in the database also includes measurement source, status, method, accuracy, remarks, and hydrologic condition flags. Interpretations for individual water-level measurements and for the period of record for the wells have been incorporated into the water-level remarks, flags, or hydrograph narratives.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This database contains geo-referenced points of Leishmania species occurrence in America.
Facebook
TwitterA database of imaging probes useful for preclinical and clinical studies. The National Institute of Mental Health (NIMH) and the Society for Non-Invasive Imaging in Drug Development (SNIDD) are in the process of creating a centralized, searchable PET, SPECT, and MRI tracer database as a resource for the scientific community. The goal of this effort is to promote the use of imaging probes in preclinical and clinical research and in drug discovery to accelerate the identification and validation of novel targets for therapeutic intervention in human diseases, especially those with central nervous system components. NIMH will maintain the tracer database as part of the Psychoactive Drug Screening Program (PDSP). The database will contain records for each radiotracer with relevant information such as target, research uses, pharmacology, pharmacokinetics, synthesis protocols, toxicology and safety data, dosimetry, other clinical data, IND info, permission to cross-reference pharmacology, toxicology, or safety data in a drug master file (if an IND exists), contact information, patent, etc. with appropriate safeguards in place to protect the intellectual property of proprietary compounds.
Facebook
TwitterThe National Bioscience Database Center (NBDC) intends to integrate all databases for life sciences in Japan, by linking each database with expediency to maximize convenience and make the entire system more user-friendly. We aim to focus our attention on the needs of the users of these databases who have all too often been neglected in the past, rather than the needs of the people tasked with the creation of databases. It is important to note that we will continue to honor the independent integrity of each database that will contribute to our endeavor, as we are fully aware that each database was originally crafted for specific purposes and divergent goals. Services: * Database Catalog - A catalog of life science related databases constructed in Japan that are also available in English. Information such as URL, status of the database site (active vs. inactive), database provider, type of data and subjects of the study are contained for each database record. * Life Science Database Cross Search - A service for simultaneous searching across scattered life-science databases, ranging from molecular data to patents and literature. * Life Science Database Archive - maintains and stores the datasets generated by life scientists in Japan in a long-term and stable state as national public goods. The Archive makes it easier for many people to search datasets by metadata in a unified format, and to access and download the datasets with clear terms of use. * Taxonomy Icon - A collection of icons (illustrations) of biological species that is free to use and distribute. There are more than 200 icons of various species including Bacteria, Fungi, Protista, Plantae and Animalia. * GenLibi (Gene Linker to bibliography) - an integrated database of human, mouse and rat genes that includes automatically integrated gene, protein, polymorphism, pathway, phenotype, ortholog/protein sequence information, and manually curated gene function and gene-related or co-occurred Disease/Phenotype and bibliography information. * Allie - A search service for abbreviations and long forms utilized in life sciences. It provides a solution to the issue that many abbreviations are used in the literature, and polysemous or synonymous abbreviations appear frequently, making it difficult to read and understand scientific papers that are not relevant to the reader's expertise. * inMeXes - A search service for English expressions (multiple words) that appear no less than 10 times in PubMed/MEDLINE titles or abstracts. In addition, you can easily access the sentences where the expression was used or other related information by clicking one of the search results. * HOWDY - (Human Organized Whole genome Database) is a database system for retrieving human genome information from 14 public databases by using official symbols and aliases. The information is daily updated by extracting data automatically from the genetic databases and shown with all data having the identifiers in common and linking to one another. * MDeR (the MetaData Element Repository in life sciences) - a web-based tool designed to let you search, compare and view Data Elements. MDeR is based on the ISO/IEC 11179 Part3 (Registry metamodel and basic attributes). * Human Genome Variation Database - A database for accumulating all kinds of human genome variations detected by various experimental techniques. * MEDALS - A portal site that provides information about databases, analysis tools, and the relevant projects, that were conducted with the financial support from the Ministry of Economy, Trade and Industry of Japan.
Facebook
TwitterFinal Rule: National Youth In Transition Database (PDF) - This final rule implements the data collection requirements of the Foster Care Independence Act, enacted in 1999. Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
We apply a research approach that can inform riparian restoration planning by developing products that show recent trends in vegetation conditions identifying areas potentially more at risk for degradation and the associated relationship between riparian vegetation dynamics and climate conditions. The full suite of data products and a link to the associated publication addressing this analysis can be found on the Parent data release. To characterize the climate conditions across the study period, we use the Standardized Precipitation Evapotranspiration Index (SPEI). The SPEI is a water balance index which includes both precipitation and evapotranspiration in its calculation. Conditions from the prior n months, generally ranging from 1 to 60, are compared to the same respective period over the prior years to identify the index value (Vicente-Serrano et al., 2010). Values generally range from -3 to 3, where values less than 0 suggest drought conditions while values greater than 0 su ...
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
GC skew denotes the relative excess of G nucleotides over C nucleotides on the leading versus the lagging replication strand of eubacteria. While the effect is small, typically around 2.5%, it is robust and pervasive. GC skew and the analogous TA skew are a localized deviation from Chargaff’s second parity rule, which states that G and C, and T and A occur with (mostly) equal frequency even within a strand.
Most bacteria also show the analogous TA skew. Different phyla show different kinds of skew and differing relations between TA and GC skew. This article introduces an open access database (https://skewdb.org) of GC and 10 other skews for over 28,000 chromosomes and plasmids. Further details like codon bias, strand bias, strand lengths and taxonomic data are also included.
The SkewDB database can be used to generate or verify hypotheses. Since the origins of both the second parity rule, as well as GC skew itself, are not yet satisfactorily explained, such a database may enhance our understanding of microbial DNA.
Methods The SkewDB analysis relies exclusively on the tens of thousands of FASTA and GFF3 files available through the NCBI download service, which covers both GenBank and RefSeq. The database includes bacteria, archaea and their plasmids. Furthermore, to ease analysis, the NCBI Taxonomy database is sourced and merged so output data can quickly be related to (super)phyla or specific species. No other data is used, which greatly simplifies processing. Data is read directly in the compressed format provided by NCBI.
All results are emitted as standard CSV files. In the first step of the analysis, for each organism the FASTA sequence and the GFF3 annotation file are parsed. Every chromosome in the FASTA file is traversed from beginning to end, while a running total is kept for cumulative GC and TA skew. In addition, within protein coding genes, such totals are also kept separately for these skews on the first, second and third codon position. Furthermore, separate totals are kept for regions which do not code for proteins. In addition, to enable strand bias measurements, a cumulative count is maintained of nucleotides that are part of a positive or negative sense gene. The counter is increased for positive sense nucleotides, decreased for negative sense nucleotides, and left alone for non-genic regions.
A separate counter is kept for non-genic nucleotides. Finally, G and C nucleotides are counted, regardless of if they are part of a gene or not. These running totals are emitted at 4096 nucleotide intervals, a resolution suitable for determining skews and shifts. In addition, one line summaries are stored for each chromosome. These line includes the RefSeq identifier of the chromosome, the full name mentioned in the FASTA file, plus counts of A, C, G and T nucleotides. Finally five levels of taxonomic data are stored.
Chromosomes and plasmids of fewer than 100 thousand nucleotides are ignored, as these are too noisy to model faithfully. Plasmids are clearly marked in the database, enabling researchers to focus on chromosomes if so desired. Fitting Once the genomes have been summarised at 4096-nucleotide resolution, the skews are fitted to a simple model. The fits are based on four parameters. Alpha1 and alpha2 denote the relative excess of G over C on the leading and lagging strands. If alpha1 is 0.046, this means that for every 1000 nucleotides on the leading strand, the cumulative count of G excess increases by 46. The third parameter is div and it describes how the chromosome is divided over leading and lagging strands. If this number is 0.557, the leading replication strand is modeled to make up 55.7% of the chromosome. The final parameter is shift (the dotted vertical line), and denotes the offset of the origin of replication compared to the DNA FASTA file. This parameter has no biological meaning of itself, and is an artifact of the DNA assembly process.
The goodness-of-fit number consists of the root mean squared error of the fit, divided by the absolute mean skew. This latter correction is made to not penalize good fits for bacteria showing significant skew. GC skew tends to be defined very strongly, and it is therefore used to pick the div and shift parameters of the DNA sequence, which are then kept as a fixed constraint for all the other skews, which might not be present as clearly. The fitting process itself is a downhill simplex method optimization over the three dimensions, seeded with the average observed skew over the whole genome, and assuming there is no shift, and that the leading and lagging strands are evenly distributed. The simplex optimization is tuned so that it takes sufficiently large steps so it can reach the optimum even if some initial assumptions are off.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MATLAB files of standardized gene expression profiles derived from stored in situ hybridizations. The profiles are provided both as separate numerical arrays in labeled .mat files and as a two-column array of cells in the file “allarrs.mat”. For the separate arrays, the labels are the filenames; for the array of cells, the labels are the character arrays in the first column. The labels include the gene name, the developmental stage and, if applicable, the sequence number. For example, the file “admp-relatedbla1.mat” contains the variable “profile”, which is a 1×100 numerical array from the first image during the blastula of admp-related. This numerical array is also located in the second column of the 252×2 cell array called “expressiondata” in the file “allarrs.mat”, behind character array “admp-relatedbla1” in the first column. (cle = cleavage, bla = blastula, ega = early gastrula, mga = mid gastrula, lga = late gastrula, epl = early planula, pla = planula, lpl = late planula). The cell array has been converted to comma separated table “expressiontable.txt”, to be processed outside MATLAB and in modified MATLAB releases. The text file has been produced with the script “exportexpression.m” and can be restored to cell array “expressiondata0” in file “allarrs0.mat” with the script “importexpression.m”. (RAR 230 kb)
Facebook
TwitterThe South Florida Water Management District (SFWMD) and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 174 NOAA Atlas 14 stations in central and south Florida. The change factors were computed as the ratio of projected future to historical extreme precipitation depths fitted to extreme precipitation data from various downscaled climate datasets using a constrained maximum likelihood (CML) approach. The change factors correspond to the period 2050-2089 (centered in the year 2070) as compared to the 1966-2005 historical period. The SFWMD manages the water resources of various interconnected areas in south Florida, which are defined in the SFWMD ArcHydro Enhanced Database (AHED) as “AHED Rain Areas”. The SFWMD is interested in summarizing change factors for each individual AHED Rain Area to use in future planning efforts. Geospatial data provided in an ArcGIS shapefile named “AHED_basins.shp” are described herein. The shapefile contains polygons for the AHED Rain Areas defined in the South Florida Water Management District (SFWMD)'s ArcHydro Enhanced Database (AHED) including their acreages.