Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Papua New Guinea and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For PNG, after five rounds of data collection from 2020-2022, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. This followed an initial pilot of the data collection from January 2023-March 2023. Data for April 2023-September 2023 were a repeated cross section, while October 2023 established the first month of a panel, which is ongoing as of March 2025. For each month, approximately 550-1000 households were interviewed. The sample is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in PNG. There is one date file for household level data with a unique household ID, and separate files for individual level data within each household data, and household food price data, that can be matched to the household file using the household ID. A unique individual ID within the household data which can be used to track individuals over time within households.
Urban and rural areas of Papua New Guinea
Household, Individual
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification from a large random sample of Digicel’s subscribers. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The resulting overall sample has a probability-based weighted design, with a proportionate stratification to achieve a proper geographical representation. More information on sampling for the cross-sectional monthly sample can be found in previous documentation for the PNG HFPS data.
A monthly panel was established in October 2023, that is ongoing as of March 2025. In each subsequent round of data collection after October 2024, the survey firm would first attempt to contact all households from the previous month, and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households.
Computer Assisted Telephone Interview [cati]
he questionnaire, which can be found in the External Resources of this documentation, is in English with a Pidgin translation.
The survey instrument for Q1 2025 consists of the following modules: -1. Basic Household information, -2. Household Roster, -3. Labor, -4a Food security, -4b Food prices -5. Household income, -6. Agriculture, -8. Access to services, -9. Assets -10. Wellbeing and shocks -10a. WASH
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.
Facebook
TwitterProvides an aggregate of data for the Office of the Actuary and the Office of Research, Evaluation and Statistics.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Tonga and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details. For Tonga, after two rounds of data collection from in 2022, monthly HFPS data collection commenced in April 2023 and continued until November 2024 (but with some gaps in the months of collection). The survey collected socio-economic data on topics including employment, income, food security, health, food prices, assets and well-being. Each month of collection has approximately 415 households in the sample and is representative of urban and rural areas. This dataset contains combined monthly survey data for all months of the continuous HFPS in Tonga.
National urban and rural areas (5 islands): Tongatapu, Vava'u, Ha'apai, Eua, Ongo Niua
Individual and household.
Sample survey data [ssd]
The Tonga High Frequency Phone Survey (HFPS) monthly sample was generated in three ways. The first method is Random Digit Dialing (RDD) process covering all cell telephone numbers active at the time of the sample selection. The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone numbering plan and then draws a random sample of numbers. This method guarantees full coverage of the population with a phone.
First, a large first-phase sample of cell phone numbers was selected and screened through an automated process to identify the active numbers. Then, a smaller second-phase sample was selected from the active residential numbers identified in the first-phase sample and was delivered to the data collection team to be called by the interviewers. When a cell phone was called, the call answerer was interviewed as long as he or she was 18 years of age or above and knowledgeable about the household activities.
It was initially planned to stratify the sample by island group based on the phone number prefixes. However, this was not feasible given the high internal migration across islands and the atypical assignment of phone number prefixes across islands in Tonga. The raw sample is overrepresenting urban areas and the population of Tongatapu.
Computer Assisted Telephone Interview [cati]
The questionnaire was developed in both English and Tongan and can be found in this documentation in Excel format. Sections of the Questionnaire are provided below: 1. Interview information and Basic information 2. Household roster 3. Labor 4. Food security and food prices 5. Household income 6. Agriculture 7. Social protection 8. Access to services 9. Assets 10. Education 11. Follow up
At the end of data collection, the raw dataset was cleaned by the survey firm and the World Bank team. Data cleaning mainly included formatting, relabeling, and excluding survey monitoring variables (e.g., interview start and end times). Data was edited using the software Stata.
Facebook
Twitter
As per our latest research, the global Continuous Data Protection (CDP) market size reached USD 4.8 billion in 2024, driven by the increasing need for robust data security and real-time backup solutions across various industries. The market is exhibiting a strong compound annual growth rate (CAGR) of 12.1% from 2025 to 2033. By the end of 2033, the Continuous Data Protection market is forecasted to attain a value of approximately USD 13.5 billion. The primary growth factor is the rising frequency of ransomware attacks and data breaches, compelling organizations to invest in advanced data protection and disaster recovery solutions.
One of the major growth drivers for the Continuous Data Protection market is the exponential increase in data generation and digital transformation initiatives worldwide. Enterprises are generating massive volumes of data from a variety of sources, including IoT devices, cloud applications, and mobile endpoints. This surge in data, coupled with the critical need to ensure business continuity, has heightened the demand for CDP solutions. Unlike traditional backup systems, CDP offers real-time or near-real-time backup, minimizing data loss and enabling rapid recovery in the event of system failures or cyber incidents. As organizations become more data-centric, the adoption of continuous data protection technologies is expected to accelerate, particularly among sectors that handle sensitive or mission-critical information.
Another significant factor fueling the growth of the Continuous Data Protection market is the evolving regulatory landscape. Governments and regulatory bodies across the globe are implementing stringent data protection and privacy regulations, such as GDPR in Europe and CCPA in California. These regulations require organizations to maintain robust data protection strategies, including real-time backup and rapid recovery capabilities. As non-compliance can lead to severe financial penalties and reputational damage, enterprises are increasingly turning to CDP solutions to ensure adherence to these mandates. The ability of CDP to provide point-in-time recovery and granular restoration of data aligns perfectly with regulatory requirements, further boosting market adoption.
Technological advancements and integration with cloud platforms are also shaping the trajectory of the Continuous Data Protection market. Modern CDP solutions are leveraging artificial intelligence, machine learning, and automation to enhance data backup, anomaly detection, and threat response. The proliferation of hybrid and multi-cloud environments has necessitated the development of CDP solutions that can seamlessly protect data across on-premises and cloud infrastructures. This trend is particularly prominent among large enterprises and organizations with distributed IT environments. Furthermore, the growing awareness of the financial and operational impacts of data loss is prompting even small and medium-sized enterprises to invest in continuous data protection, thus expanding the market’s addressable base.
From a regional perspective, North America continues to dominate the Continuous Data Protection market due to its advanced IT infrastructure, high adoption of cloud computing, and heightened focus on cybersecurity. However, the Asia Pacific region is witnessing the fastest growth, attributed to rapid digitalization, increasing investments in IT security, and rising awareness about data protection among enterprises. Europe also holds a significant market share, driven by strict data privacy regulations and a mature enterprise landscape. The Middle East & Africa and Latin America are emerging markets, where growing digital transformation and regulatory developments are expected to create new opportunities for CDP vendors in the coming years.
The Continuous Data Protection market is segmented by component into software, hardware, and services. The software segment holds the largest share, accounting for more than 55% of the global m
Facebook
TwitterThe ALTUS Cloud Electrification Study (ACES) was based at the Naval Air Facility Key West in Florida. During August, 2002, ACES researchers conducted overflights of thunderstorms over the southwestern corner of Florida. For the first time in NASA research, an uninhabited aerial vehicle (UAV) named ALTUS was used to collect cloud electrification data. Carrying field mills, optical sensors, electric field sensors and other instruments, ALTUS allowed scientists to collect cloudelectrification data for the first time from above the storm, from its birth through dissipation. This experiment allowed scientists to achieve the dual goals of gathering weather data safely and http://example.com/testing new aircraft technology. This dataset consists of data collected from seven instruments: the Slow/Fast antenna, Electric Field Mill, Dual Optical Pulse Sensor, Searchcoil Magnetometer, Accelerometers, Gerdien Conductivity Probe, and the Fluxgate Magnetometer. Data consists of sensor reads at 50HZ throughout the flight from all 64 channels.
Facebook
TwitterThis data release includes water-quality data collected at up to thirteen locations along the Merrimack River and Merrimack River Estuary in Massachusetts. In this study, conducted by the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Department of Environmental Protection, discrete samples were collected, and continuous monitoring was completed from June to September 2020. The data include results of measured field properties (water temperature, specific conductivity, pH, dissolved oxygen) and laboratory concentrations of nitrogen and phosphorus species, total carbon, pheophytin-a, and chlorophyll-a. These data were collected to assess selected (mainly nutrients) water-quality conditions in the Merrimack River and Merrimack River Estuary at the thirteen locations and identify areas where more water-quality monitoring is needed. The discrete samples and continuous-monitoring data are also available in the USGS National Water Information System at https://waterdata.usgs.gov/nwis. This data release consists of (1) Table of the discrete water-quality data collected (Merrimack_DiscreteWQ_Data.csv); (2) Statistical summaries including the minimum, median, and maximum of the discrete water-quality data collected (Merrimack_DiscreteWQ_Statistical_Data.original.csv); (3) Statistical summaries including the minimum, median, and maximum of the continuous water-quality data collected (Merrimack_ContinuousWQ_Statistical_Data.csv); (4) Table of vertical profile data (Merrimack_VerticalWQ_Profiles_Data.csv); (5) Table of continuous monitor deployment location and dates (Merrimack_ContinuousWQ_Deployment_Dates.csv); (6) Time-series plots of continuous water-quality data (Continuous_QW_Plots_All.zip); (7) Vertical profile plots (Vertical Profiles_QW_Plots.zip).
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Vanuatu and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For Vanuatu, data for December 2023 – January 2025 was collected with each month having approximately 1000 households in the sample and is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Vanuatu. There is one date file for household level data with a unique household ID. And a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.
National, urban and rural. Six provinces were covered by this survey: Sanma, Shefa, Torba, Penama, Malampa and Tafea.
Household and individuals.
Sample survey data [ssd]
The Vanuatu High Frequency Phone Survey (HFPS) sample is drawn from the list of customer phone numbers (MSIDNS) provided by Digicel Vanuatu, one of the country’s two main mobile providers. Digicel’s customer base spans all regions of Vanuatu. For the initial data collection, Digicel filtered their MSIDNS database to ensure a representative distribution across regions. Recognizing the challenge of reaching low-income respondents, Digicel also included low-income areas and customers with a low-income profile (defined by monthly spending between 50 and 150 VT), as well as those with only incoming calls or using the IOU service without repayment. These filtered lists were then randomized, and enumerators began calling the numbers.
This approach was used to complete the first round of 1,000 interviews. The respondents from this first round formed a panel to be surveyed monthly. Each month, phone numbers from the panel are contacted until all have been interviewed, at which point new phone numbers (fresh MSIDNS from Digicel’s database) are used to replace those that have been exhausted. These new respondents are then added to the panel for future surveys.
Computer Assisted Telephone Interview [cati]
The questionnaire was developed in both English and Bislama. Sections of the Questionnaire:
-Interview Information
-Household Roster (separate modules for new households and returning households)
-Labor (separate modules for new households and returning households)
-Food Security
-Household Income
-Agriculture
-Social Protection
-Access to Services
-Assets
-Perceptions
-Follow-up
At the end of data collection, the raw dataset was cleaned by the survey firm and the World Bank team. Data cleaning mainly included formatting, relabeling, and excluding survey monitoring variables (e.g., interview start and end times). Data was edited using the software STATA.
The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 13,779 in the household dataset and 77,501 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (hhid_mem) can be found in the individual dataset.
In November 2024, a total of 7,874 calls were made. Of these, 2,251 calls were successfully connected, and 1,000 respondents completed the survey. By February 2024, the sample was fully comprised of returning respondents, with a re-contact rate of 99.9 percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Nominal Wages: Usual Earnings: Central West: Government Sector data was reported at 5,017.000 BRL in Mar 2019. This records an increase from the previous number of 4,816.000 BRL for Dec 2018. Average Nominal Wages: Usual Earnings: Central West: Government Sector data is updated quarterly, averaging 4,102.000 BRL from Mar 2012 (Median) to Mar 2019, with 29 observations. The data reached an all-time high of 5,017.000 BRL in Mar 2019 and a record low of 3,240.000 BRL in Mar 2012. Average Nominal Wages: Usual Earnings: Central West: Government Sector data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Brazil Premium Database’s Labour Market – Table BR.GBD001: Continuous National Household Sample Survey: Average Nominal Wages: Usual Earnings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
Twitter
As per our latest research, the global Continuous Data Protection Platform market size reached USD 5.8 billion in 2024, with a robust compound annual growth rate (CAGR) of 11.7%. This dynamic market is primarily driven by the increasing need for real-time data backup and recovery solutions across industries. By 2033, the market is forecasted to reach USD 15.9 billion, highlighting significant expansion opportunities. The surge in cyber threats, stringent regulatory requirements, and the exponential growth in enterprise data volumes are among the key factors fueling this upward trajectory.
One of the most significant growth factors for the Continuous Data Protection Platform market is the escalating frequency and sophistication of cyberattacks and ransomware incidents globally. Organizations are increasingly recognizing the limitations of traditional backup solutions, which often leave critical gaps between scheduled backups, resulting in potential data loss. Continuous Data Protection (CDP) platforms address these vulnerabilities by capturing and saving every change made to data in real time, thereby ensuring near-zero data loss and enabling rapid recovery. As businesses become more data-centric and digital transformation accelerates, the demand for robust data protection mechanisms such as CDP is expected to rise substantially, especially in sectors like banking, healthcare, and retail where data integrity and availability are paramount.
Another pivotal driver is the growing adoption of cloud computing and hybrid IT environments. Enterprises are migrating their workloads to cloud platforms for scalability, cost efficiency, and flexibility, but this shift also introduces new data protection challenges. CDP solutions are evolving to seamlessly integrate with public, private, and hybrid cloud infrastructures, providing unified data protection across diverse environments. The ability to protect data regardless of where it resides—on-premises, in the cloud, or within edge devices—makes CDP platforms indispensable for modern organizations. Moreover, regulatory frameworks such as GDPR, HIPAA, and CCPA are compelling organizations to invest in advanced data protection solutions that ensure compliance and minimize the risk of penalties associated with data breaches.
The proliferation of data generated by emerging technologies such as IoT, artificial intelligence, and big data analytics is further propelling the demand for Continuous Data Protection Platforms. Enterprises are dealing with unprecedented data volumes and velocities, which necessitate always-on backup and instant recovery capabilities. CDP platforms not only provide continuous data capture but also facilitate granular recovery options, enabling businesses to restore data to any point in time. This is particularly valuable in environments with mission-critical applications, where even minimal downtime or data loss can have severe operational and financial repercussions. As a result, the integration of CDP solutions is becoming a strategic priority for organizations aiming to enhance their data resilience and business continuity frameworks.
From a regional perspective, North America currently dominates the Continuous Data Protection Platform market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the presence of major technology vendors, early adoption of advanced IT security solutions, and a highly regulated business environment. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, increasing cloud adoption, and rising awareness about data protection among enterprises and government bodies. Europe also holds a significant market share, supported by stringent data protection regulations and a mature IT infrastructure. The Middle East & Africa and Latin America are witnessing steady growth as organizations in these regions invest in modernizing their data protection strategies to support digital transformation initiatives.
Facebook
TwitterThe U.S. Geological Survey (USGS), in cooperation with the York County Planning Commission and York County Conservation District, has collected discrete stream samples for analysis of suspended-sediment, total nitrogen, and total phosphorus concentrations at six real-time streamflow and water-quality monitoring (turbidity, nitrate, and specific conductance) stations located in York County, Pennsylvania. Data were collected from 2019-2023 at these stations for the application of predicting suspended-sediment, total nitrogen, and total phosphorus concentrations using real-time continuous turbidity, nitrate, specific conductance, and streamflow. Regression equations were developed by relating discrete-sample suspended sediment and continuous turbidity, discrete sample total nitrogen and continuous nitrate plus nitrite (also referred to as NOx) and specific conductance, and total phosphorus and continuous turbidity data, and streamflow data. Regression equations included possible explanatory variables (independent variables) of continuous turbidity, continuous NOx, continuous specific conductance, streamflow, and calculated seasonal terms; and the response variables (dependent variables) of suspended-sediment, total nitrogen, and total phosphorus concentration using base-10 logarithmic (log10) or natural log (ln) transformations as appropriate on both the discrete and continuous data. Data files in .csv format for SSC, TP, and TN models include the variables of datetime (nearest 15-minutes), sample datetime (exact time of sample), suspended sediment, total phosphorus concentrations (in milligrams per liter, mg/L as P), total nitrogen concentrations (in milligrams per liter, mg/L s N), turbidity (in formazin nephelometric units, FNU), specific conductance (in microsiemens per centimeter at 25 degrees Celsius), NOx (nitrate plus nitrite, in mg/L as N), streamflow (Q, cubic feet per second, cfs), suspended-sediment particles classified as “fine” (smaller than 0.0625 millimeters, in percent); the log base 10 transformed variables of suspended-sediment and phosphorus concentrations, turbidity, streamflow; natural log transformed variables of total nitrogen, NOx, specific conductance, and streamflow; and the calculated seasonal terms of JulianDay (Julian Day, integer representation of day of year), sin(2piJulian Day/365), cos(2piJulian Day/365), sin(4piJulian Day/365), cos(4piJulian Day/365), and sand (suspended-sediment particles classified as “sand”, 100 – fines, in percent). Models were developed for 6 stream sites at or near these USGS stream gages: 01573660 Fishing Creek at Goldsboro, PA 01574000 West Conewago Creek near Manchester, PA 01575598 Codorus Creek near Saginaw, PA 01576007 Kreutz Creek at Strickler, PA 01576045 Fishing Creek at Craley, PA 01577500 Muddy Creek at Castle Fin, PA Data sources for models: For the models developed for station 01573660, discrete and continuous data are based on data from station 01573660. For the models developed for station 01574000, discrete and continuous data are based on data from station 01574000. For the models developed for station 01575598, discrete and continuous data are based on data from station 01575598. For the models developed for station 01576007, discrete and continuous data are based on data from station 01576007. For the models developed for station 01576045, discrete and continuous data are based on data from station 01576045. For the models developed for station 01577500, discrete and continuous data are based on data from station 01577500. First release: March 2024 (available from author) Revised: May 2024 (ver. 2.0)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the Stan code applicable to the continuous bounded Item Response Model (IRT) model and an empirical data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Nominal Wages: Usual Earnings: Ceará: Private Sector: Unregistered data was reported at 797.000 BRL in Mar 2019. This records a decrease from the previous number of 806.000 BRL for Dec 2018. Average Nominal Wages: Usual Earnings: Ceará: Private Sector: Unregistered data is updated quarterly, averaging 692.000 BRL from Mar 2012 (Median) to Mar 2019, with 29 observations. The data reached an all-time high of 856.000 BRL in Mar 2018 and a record low of 518.000 BRL in Sep 2012. Average Nominal Wages: Usual Earnings: Ceará: Private Sector: Unregistered data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Brazil Premium Database’s Labour Market – Table BR.GBD001: Continuous National Household Sample Survey: Average Nominal Wages: Usual Earnings.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Solomon Islands and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For Solmon Islands, after five rounds of data collection from 2020-2020, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. Fieldwork took place in two non-consecutive weeks of each month. Data for April 2023-December 2023 were a repeated cross section, while January 2024 established the first month of a panel, the was continued to September 2024. Each month has approximately 550 households in the sample and is representative of urban and rural areas, but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Solomon Islands. There is one date file for household level data with a unique household ID. and a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.
Urban and rural areas of Solomon Islands.
Household, individual.
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The initial sample was drawn from information provided by a major phone service provider in Solomon Islands, covering all the provinces in the country. It had a probability-based weighted design, with a proportionate stratification to achieve geographical representation. The geographical distribution compared to the 2019 Census is listed below for the first month of the HFPS monthly survey:
Choiseul : Census: 4.3%, HFPS: 5.2% Western : Census: 14.4%, HFPS: 13.7% Isabel : Census: 4.8%, HFPS: 4.7% Central : Census: 3.6%, HFPS: 5.2% Ren Bell : Census: 0.6%, HFPS: 1.4% Guadalcanal: Census: 19.8%, HFPS: 21.1% Malaita : Census: 23.1%, HFPS: 18.7% Makira : Census: 5.6%, HFPS: 5.6% Temotu: Census: 3.0%, HFPS: 3% Honiara: Census: 20.7%, HFPS: 21.3%
Source: Census of Population and Housing 2019
Note: The values in the HFPS column represent the proportion of survey participants residing in each province, based on the raw HFPS data from April.
In April 2023, the geographic distribution of World Bank HFPS participants was generally similar to that of the census data at the province level, though within provinces, areas with less mobile phone connectivity are likely to be underrepresented. One indication of this is that urban areas constituted 38.2 percent of the survey sample, which is a slight overrepresentation, compared to 32.5 percent in the Census 2019.
A monthly panel was established in January 2024, that is ongoing as of March 2025. In each subsequent month after January 2024, the survey firm would first attempt to contact all households from the previous month and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households. Across all months of the survey a total of, 9,926 interviews were completed.
Computer Assisted Telephone Interview [cati]
The questionnaire, which can be found in the External Resources of this documentation, is available in English, with Solomons Pijin translation. There were few changes to the questionnaire across the survey months, but some sections were only introduced in 2024, namely energy access questions and questions to inform the baseline data of the Solomon Islands Government Integrated Economic Development and Climate Resilience (IEDCR) project.
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 9,926 in the household dataset and 62,054 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Nominal Wages: Usual Earnings: Amapá: Private Sector: Registered data was reported at 1,544.000 BRL in Mar 2019. This records an increase from the previous number of 1,525.000 BRL for Dec 2018. Average Nominal Wages: Usual Earnings: Amapá: Private Sector: Registered data is updated quarterly, averaging 1,354.000 BRL from Mar 2012 (Median) to Mar 2019, with 29 observations. The data reached an all-time high of 1,559.000 BRL in Sep 2017 and a record low of 947.000 BRL in Mar 2012. Average Nominal Wages: Usual Earnings: Amapá: Private Sector: Registered data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Brazil Premium Database’s Labour Market – Table BR.GBD001: Continuous National Household Sample Survey: Average Nominal Wages: Usual Earnings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Nominal Wages: Usual Earnings: Central West: Employers data was reported at 6,044.000 BRL in Mar 2019. This records an increase from the previous number of 5,872.000 BRL for Dec 2018. Average Nominal Wages: Usual Earnings: Central West: Employers data is updated quarterly, averaging 5,048.000 BRL from Mar 2012 (Median) to Mar 2019, with 29 observations. The data reached an all-time high of 6,044.000 BRL in Mar 2019 and a record low of 4,022.000 BRL in Jun 2012. Average Nominal Wages: Usual Earnings: Central West: Employers data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Brazil Premium Database’s Labour Market – Table BR.GBD001: Continuous National Household Sample Survey: Average Nominal Wages: Usual Earnings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Nominal Wages: Usual Earnings: Amazonas: Private Sector data was reported at 1,607.000 BRL in Mar 2019. This records an increase from the previous number of 1,491.000 BRL for Dec 2018. Average Nominal Wages: Usual Earnings: Amazonas: Private Sector data is updated quarterly, averaging 1,417.000 BRL from Mar 2012 (Median) to Mar 2019, with 29 observations. The data reached an all-time high of 1,607.000 BRL in Mar 2019 and a record low of 1,175.000 BRL in Dec 2012. Average Nominal Wages: Usual Earnings: Amazonas: Private Sector data remains active status in CEIC and is reported by Brazilian Institute of Geography and Statistics. The data is categorized under Brazil Premium Database’s Labour Market – Table BR.GBD001: Continuous National Household Sample Survey: Average Nominal Wages: Usual Earnings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.