Facebook
TwitterThis statistical note contains figures relating to tests and people who were tested under pillar 1 or pillar 2 of the government testing strategy.
Pillar 1 is swab testing in Public Health England (PHE) labs and NHS hospitals for those with a clinical need, and health and care workers.
Pillar 2 is swab testing for the wider population, through commercial partnerships.
Facebook
Twitterhttps://digital.nhs.uk/services/data-access-request-service-darshttps://digital.nhs.uk/services/data-access-request-service-dars
Data forming the Covid-19 Second Generation Surveillance Systems data set relate to demographic and diagnostic information from Pillar 1 swab testing in PHE labs and NHS hospitals for those with a clinical need, and health and care workers and Pillar 2 Swab testing in the community at drive through test centres, walk in centres, home kits returned by posts, care homes, prisons etc).
Timescales for dissemination can be found under 'Our Service Levels' at the following link: https://digital.nhs.uk/services/data-access-request-service-dars/data-access-request-service-dars-process
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 Dataset for Correlation Between Early Government Interventions in the Northeastern United States and Peak COVID-19 Disease Burden by Joel Mintz. File Type: Excel Contents: Tab 1 ("Raw")=Raw Data as Downloaded directly from COVID Tracking Project, sorted by date Tab 2-14 ("State Name') = Data Sorted by State Tab 2-14 Headers: Column 1: Population per state, as recorded by latest American Community Survey, maximum (peak) COVID-19 outcome, with date on which outcome occurred. Column 2: Date on which numbers were recorded* Column 3: State Name* Column 4: Number of reported positive COVID-19 tests* Column 5: Number of reported negative COVID-19 tests* Column 6: Pending COVID-19 tests* Column 7: Currently Hospitalized* Column 8: Cumulatively Hospitalized* Column 9: Currently in ICU* Column 10: Cumulatively in ICU* Column 11: Currently on Ventilator Support* Column 12: Cumulatively on Ventilator Support* Column 13: Total Recovered* Column 14: Cumulative Mortality* *Provided in Original Raw Data Column 15: Total Tests Administered (Column 4+Column 5) Column 16: Placeholder Column 17: % of total population tested Column 18: New Cases Per day Column 19: Change in new cases per day Column 20: Positive cases per day per capita in number per/ hundreds of thousands: (Column 18/total population*100000) Column 21: Change in Positive cases per day per capita in number per/ hundreds of thousands: (Column 19/total population*100000) Column 22: Hospitalizations per day per capita in number per/ hundreds of thousands Column 23: Change in Hospitalizations per day per capita in number per/ hundreds of thousands Column 24: Deaths per day per capita in number per/ hundreds of thousands Column 25: Change in Deaths per day per capita in number per/ hundreds of thousands Column 26-31: Columns 20-25 with an applied 5 day moving average filter Column 32: Adjusted hospitalization: (Subtract number of hospitalizations from the initial number of hospitalzations where reporting bean) Column 33: Adjusted hospitalizations per day per capita Column 34: Adjusted hospitalizations per day per capita, with applied 5 day moving average filter
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This publication was archived on 12 October 2023. Please see the Viral Respiratory Diseases (Including Influenza and COVID-19) in Scotland publication for the latest data. This dataset provides information on number of new daily confirmed cases, negative cases, deaths, testing by NHS Labs (Pillar 1) and UK Government (Pillar 2), new hospital admissions, new ICU admissions, hospital and ICU bed occupancy from novel coronavirus (COVID-19) in Scotland, including cumulative totals and population rates at Scotland, NHS Board and Council Area levels (where possible). Seven day positive cases and population rates are also presented by Neighbourhood Area (Intermediate Zone 2011). Information on how PHS publish small are COVID figures is available on the PHS website. Information on demographic characteristics (age, sex, deprivation) of confirmed novel coronavirus (COVID-19) cases, as well as trend data regarding the wider impact of the virus on the healthcare system is provided in this publication. Data includes information on primary care out of hours consultations, respiratory calls made to NHS24, contact with COVID-19 Hubs and Assessment Centres, incidents received by Scottish Ambulance Services (SAS), as well as COVID-19 related hospital admissions and admissions to ICU (Intensive Care Unit). Further data on the wider impact of the COVID-19 response, focusing on hospital admissions, unscheduled care and volume of calls to NHS24, is available on the COVID-19 Wider Impact Dashboard. Novel coronavirus (COVID-19) is a new strain of coronavirus first identified in Wuhan, China. Clinical presentation may range from mild-to-moderate illness to pneumonia or severe acute respiratory infection. COVID-19 was declared a pandemic by the World Health Organisation on 12 March 2020. We now have spread of COVID-19 within communities in the UK. Public Health Scotland no longer reports the number of COVID-19 deaths within 28 days of a first positive test from 2nd June 2022. Please refer to NRS death certificate data as the single source for COVID-19 deaths data in Scotland. In the process of updating the hospital admissions reporting to include reinfections, we have had to review existing methodology. In order to provide the best possible linkage of COVID-19 cases to hospital admissions, each admission record is required to have a discharge date, to allow us to better match the most appropriate COVID positive episode details to an admission. This means that in cases where the discharge date is missing (either due to the patient still being treated, delays in discharge information being submitted or data quality issues), it has to be estimated. Estimating a discharge date for historic records means that the average stay for those with missing dates is reduced, and fewer stays overlap with records of positive tests. The result of these changes has meant that approximately 1,200 historic COVID admissions have been removed due to improvements in methodology to handle missing discharge dates, while approximately 820 have been added to the cumulative total with the inclusion of reinfections. COVID-19 hospital admissions are now identified as the following: A patient's first positive PCR or LFD test of the episode of infection (including reinfections at 90 days or more) for COVID-19 up to 14 days prior to admission to hospital, on the day of their admission or during their stay in hospital. If a patient's first positive PCR or LFD test of the episode of infection is after their date of discharge from hospital, they are not included in the analysis. Information on COVID-19, including stay at home advice for people who are self-isolating and their households, can be found on NHS Inform. Data visualisation of Scottish COVID-19 cases is available on the Public Health Scotland - Covid 19 Scotland dashboard. Further information on coronavirus in Scotland is available on the Scottish Government - Coronavirus in Scotland page, where further breakdown of past coronavirus data has also been published.
Facebook
TwitterHTTPS://CPRD.COM/DATA-ACCESSHTTPS://CPRD.COM/DATA-ACCESS
Second Generation Surveillance System (SGSS) is the national laboratory reporting system used in England to capture routine laboratory data on infectious diseases and antimicrobial resistance. The SARS-CoV-2 testing started in UK laboratories on 24/02/2020, with the SGSS data reflecting testing (swab samples, PCR test method) offered to those in hospital and NHS key workers (i.e. Pillar 1). The CPRD-SGSS linked data currently contain positive tests results only.
Facebook
TwitterCPRD GOLD linked Second Generation Surveillance System (SGSS) data contains SARS-CoV-2 testing (swab samples, PCR test method) offered to those in hospital and NHS key workers (i.e. Pillar 1) and includes positive tests results only.
Facebook
TwitterQResearch GP data is linked to Second Generation Surveillance System (SGSS) data contains SARS-CoV-2 testing (swab samples, PCR test method) offered to those in hospital and NHS key workers (i.e. Pillar 1) and includes positive tests results only
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2016-2020 American Community Survey (ACS) population estimates are included to calculate the cumulative rate per 10,000 residents.
Dataset covers cases going back to 3/2/2020 when testing began. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily.
Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas
B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date.
The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date).
COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated.
C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 05:00 Pacific Time.
D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset can be used to track the spread of COVID-19 throughout the city, in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date.
Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. Cases are dropped altogether for areas where acs_population < 1000 4. Deaths data are not included in this dataset for privacy reasons. The low COVID-19 death rate in San Francisco, along with other publicly available information on deaths, means that deaths data by geography and day is too granular and potentially risky. Read more in our privacy guidelines
Rate suppression in effect where counts lower than 20 Rates are not calculated unless the cumulative case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.
Rows included for Citywide case counts Rows are included for the Citywide case counts and incidence rate every day. These Citywide rows can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling bases.
Related dataset See the dataset of the most recent cumulative counts for all geographic areas here: https://data.sfgov.org/COVID-19/COVID-19-Cases-and-Deaths-Summarized-by-Geography/tpyr-dvnc
E. CHANGE LOG
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Up-flow column percolation tests are used at laboratory scale to assess the leaching behavior of hazardous substance from contaminated soils in a specific condition as a function of time. Monitoring the quality of these test results inter or within laboratory is crucial, especially if used for Environment-related legal policy or for routine testing purposes. We tested three different sandy loam type soils (Soils I, II and III) to determine the reproducibility (variability inter laboratory) of test results and to evaluate the difference in the test results within laboratory. Up-flow column percolation tests were performed following the procedure described in the ISO/TS 21268–3. This procedure consists of percolating solution (calcium chloride 1 mM) from bottom to top at a flow rate of 12 mL/h through softly compacted soil contained in a column of 5 cm diameter and 30 ± 5 cm height. Eluate samples were collected at liquid-to-solid ratio of 0.1, 0.2, 0.5, 1, 2, 5 and 10 L/kg and analyzed for quantification of the target elements (Cu, As, Se, Cl, Ca, F, Mg, DOC and B in this research). For Soil I, 17 institutions in Japan joined this validation test. The up-flow column experiments were conducted in duplicate, after 48 h of equilibration time and at a flow rate of 12 mL/h. Column percolation test results from Soils II and III were used to evaluate the difference in test results from the experiments conducted in duplicate in a single laboratory, after 16 h of equilibration time and at a flow rate of 36 mL/h. Overall results showed good reproducibility (expressed in terms of the coefficient of variation, CV, calculated by dividing the standard deviation by the mean), as the CV was lower than 30% in more than 90% of the test results associated with Soil I. Moreover, low variability (expressed in terms of difference between the two test results divided by the mean) was observed in the test results related to Soils II and III, with a variability lower than 30% in more than 88% of the cases for Soil II and in more than 96% of the cases for Soil III. We also discussed the possible factors that affect the reproducibility and variability in the test results from the up-flow column percolation tests. The low variability inter and within laboratory obtained in this research indicates that the ISO/TS 21268–3 can be successfully upgraded to a fully validated ISO standard.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1: Table S1. The table provides a listing of all the resources used to collect missing historical testing data points. It consists of 7 columns: Country/State, First Data Point, Last Data Point, Language, Data Type, Test Type reported and the Source reference. The “First Data Point” and “Last Data Point” columns indicate the date of the first and last manually collected data points, respectively. The “Language” column indicates the original language of the resource. The “Data Type” column indicates the format of the data (API: application programming interface, Infographic: uploaded data that gets overridden daily, Daily reports, News reports, Graphs and Machine readable datasets). The “Test Type Reported” column indicates the method used to test for SARS-CoV-2: PCR, serological or unspecified. The “Source” provides the URL to the resource used.
Facebook
TwitterCPRD Aurum linked Second Generation Surveillance System (SGSS) data contains SARS-CoV-2 testing (swab samples, PCR test method) offered to those in hospital and NHS key workers (i.e. Pillar 1) and includes positive tests results only.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global Resonant Column Apparatus market, valued at $6.3 million in 2025, is projected to experience steady growth driven by increasing infrastructure development and advancements in geotechnical engineering. The construction, transportation, and marine engineering sectors are key application areas, relying on precise soil characterization for robust and safe infrastructure projects. The market's Compound Annual Growth Rate (CAGR) of 2% indicates a consistent, albeit moderate, expansion over the forecast period (2025-2033). This growth is fueled by the rising demand for accurate soil testing to optimize designs and minimize risks associated with ground instability. Technological advancements in resonant column apparatus, including enhanced data acquisition and analysis capabilities, are expected to further boost market adoption. Different types of apparatus, categorized by maximum lateral pressure (e.g., 1 MPa and 2 MPa), cater to diverse soil testing needs, contributing to market segmentation. While the market faces certain restraints, such as high initial investment costs and the availability of alternative testing methods, the long-term demand for reliable geotechnical data ensures a sustained market presence. Geographic distribution is expected to be relatively balanced, with North America and Europe holding significant market shares, while Asia-Pacific regions demonstrate potential for future growth due to rapid infrastructure development. The competitive landscape comprises both established players like GDS Instruments and Geocomp, and emerging companies focusing on innovative testing solutions. The market is likely to see consolidation and strategic partnerships in the coming years. Specific regional variations in growth rates will be influenced by factors such as government regulations, infrastructure spending, and the level of technological adoption within the geotechnical engineering community. Continuous improvements in the accuracy and efficiency of resonant column testing will be crucial for sustaining market growth, as well as promoting the technology's benefits to a broader range of applications and stakeholders. The market will likely witness increased focus on user-friendly interface design and data analysis software integration to expand market appeal and adoption.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Linearity studies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
When you develop a new WSD method or any NLP system that requires WSD, you can use this exercise to evaluate your system. Currently, there are few such exercise which are available for evaluating WSD methods. However, to make evaluation method easy and faster, this data set has been developed. This data set contains 4105 sentences each containing one or more polysemy words.
The dataset file is an excel file (.xlsx). It have 3 columns- Serial number (SN), Sentence/context column and polysemy word. The useful columns are first and third. The second column contains the sentences (or context). The sentence my contain more polysemy words. However, the third column specify the target word (polyemy word) in the sentence in second column. You can use the sentence to check whether your developed WSD system is able to disambiguate the target word in 3rd column in the corresponding sentence in 2nd column.
After you develop your new WSD system, to evaluate it using this data set follow the following steps: 1) Add two columns- named "correct_id" and "calculated_id". The correct_id means the word id of correct sense of the target word in your database and store it at 4th column. You have to manually fill this for all sentences in the test data based. The calculated_id is the word id generated by your WSD system after sense disambiguation of the target word for the provided context sentence and store it in 5th column. 2) Develop a module which upload this excel file and read the sentence (in 2nd column) and the target word (in 3rd column). 3) Provide the sentence as a context and the target word as the polysemy words into your system for disambiguation. 4) After sense disambiguation of the target word, write the calculated_id of the target word in the 5th column. 5) In the 6th column, provide the formula to check whether the correct_id and calculated_id match or not. If they match, write 1 in the corresponding cell in 6th column. Otherwise, write 0. 6) At the end of the 6th column, provide a sum formula to count the number of 1s. 7) Calculate the accuracy of your system using the formula: - ((total number of 1's count)/Total number of test sentences)*100
1) the first 2905 sentences are collected from online dictionaries, web etc. 2) The remaining 1200 sentences are collected from "news" category of Brown corpus.
This data set provides an easy_to_use WSD evaluation exercise containing 4105 sentences with 320 polysemy word's occurances.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Antibodies play a key role in the immune defence against infectious pathogens. Understanding the underlying process of B cell recognition is not only of fundamental interest; it supports important applications within diagnostics and therapeutics. Whereas the nature of conformational B cell epitope recognition is inherently complicated, linear B cell epitopes offer a straightforward approach that potentially can be reduced to one of peptide recognition. Methods: Using an overlapping peptide approach representing the entire proteomes of the seven main coronaviruses known to infect humans, we analysed sera pooled from eight PCR-confirmed COVID-19 convalescents and eight pre-pandemic controls. Using a high-density peptide microarray platform, 13-mer peptides overlapping by 11 amino acids were in situ synthesised and incubated with the pooled primary serum samples, followed by development with secondary fluorochrome-labelled anti-IgG and -IgA antibodies. Interactions were detected by fluorescence detection. Strong Ig interactions encompassing consecutive peptides were considered to represent "high-fidelity regions" (HFRs). These were mapped to the coronavirus proteomes using a 60% homology threshold for clustering. Results: We identified 333 human coronavirus derived HFRs. Among these, 98 (29%) mapped to SARS-CoV-2, 144 (44%) mapped to one or more of the four circulating common cold coronaviruses (CCC), and 54 (16%) cross-mapped to both SARS-CoV-2 and CCCs. The remaining 37 (11%) mapped to either SARS-CoV or MERS-CoV. Notably, the COVID-19 serum was skewed towards recognising SARS-CoV-2-mapped HFRs, whereas the pre-pandemic was skewed towards recognising CCC-mapped HFRs. In terms of absolute numbers of linear B cell epitopes, the primary targets are the ORF1ab protein (60%), the spike protein (21%), and the nucleoprotein (15%) in that order; however, in terms of epitope density, the order would be reversed. Conclusion: We identified linear B cell epitopes across coronaviruses, highlighting pan-, alpha-, beta-, or SARS-CoV-2-corona-specific B cell recognition patterns. These findings could be pivotal in deciphering past and current exposures to epidemic and endemic coronavirus. Moreover, our results suggest that pre-pandemic anti-CCC antibodies may cross-react against SARS-CoV-2, which could explain the highly variable outcome of COVID-19. Finally, the methodology used here offers a rapid and comprehensive approach to high-resolution linear B-cell epitope mapping, which could be vital for future studies of emerging infectious diseases. Methods Peptide microarray design Peptide microarrays were designed using the proteomes of the seven human coronaviruses (HcoVs): · HCoV-229E: 7 proteins · HCoV-HKU1(N1): 8 proteins · HCoV-NL63: 6 proteins · HCoV-OC43: 9 proteins · MERS-CoV: 9 proteins · SARS-CoV: 14 proteins · SARS-CoV-2: 13 proteins The open reading frame (ORF) 1a was excluded from all the HCoV proteomes since the ORF1ab covered these sequences. As a positive control pathogen, the entire proteome of the human cytomegalovirus (HCMV, strain AD169), consisting of 190 proteins, was included together with the entire proteome of the Zaire Ebola virus (strain Mayinga-76, EBOZM), consisting of 9 proteins. These 265 protein sequences were represented as 13 amino acid long peptides, overlapping by 11 amino acids and tiling by 2 amino acids. Leading to total of 66581 non-redundant virus-derived peptide sequences. As a source of background-binding control, 3900 non-overlapping 13 amino acid peptide sequences were generated in silico using the amino acid frequencies from the 265 virus-derived proteins. Peptide microarray synthesis and probing The 66581 virus-derived peptide sequences in triplicate and the 3900 random background-binding peptides in duplicate were distributed randomly across 12 virtual sectors using proprietary software (PepArray, Schafer-N). Peptides were synthesised by Schafer-N (Copenhagen) on amino-functionalized glass microscope slides using a maskless photolithographic light-directed solid-phase peptide synthesis. Peptide microarrays were incubated with convalescent COVID-19 or pre-pandemic sera diluted 1:100 in PBS (0.1% BSA, 0.1% Triton X-100) for 2 hours at room temperature, followed by washing and development using 1 µg/mL Cy3 or Cy5-labelled secondary antibodies against human IgG or IgA, respectively. After washing, microarrays were dried and scanned on a microarray laser scanner (INNOSCAN 900, Innopsys, France) and then quantified at an 8-bit resolution and purged of artefacts using proprietary PepArray software. Usage notes Aggregated microarray data for all samples: microarray_data_aggregated.txt Fluorescence intensity values for each peptide on the microarray are found in an aggregated tab-separated file format. The first column contains the synthesised peptide sequences, and the second column contains the peptide group: · Test: Peptides derived from any one of the 9 virus strains · Random: The random peptides The remaining four columns contain the fluorescence intensity values extracted for the COVID-19 convalescent (pandemic) serum pool IgA and IgG and the pre-pandemic serum pool IgA and IgG. Peptide to protein mapper file: peptide_map.txt The peptide parent protein names and their locations are in a tab-separated file format. The first column contains the synthesised peptide sequences, the second column contains the organism name, the third column is the UniProt ID of their parent proteins, the fourth column is the abbreviated protein name, the fifth column is the length of the parent proteins, the sixth and seventh columns are the start and end coordinates of the peptides in their parent protein. Combining files: microarray_data_aggregated.txt and peptide_map.txt can be combined using the "coresequence" column, representing the peptide sequences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The file is tab delimited, with 22546 rows including the names, and 30690 columns.
The file was generated to test the hypothesis whether the five most common mutations in the SARS-CoV-2 genome replication complex proteins, nsps 7, 8, 12, and 14, significantly affect the mutation density of the virus over time and whether these affect the synonymous and nonsynonymous mutation densities differently. We discovered that mutations in nsp14, an exonuclease with error correcting capabilities, are most likely to be correlated with increased mutational load across the genome compared to wildtype SARS-CoV-2. These results were obtained by identifying the frequency of mutations across all isolates in genomic regions of interest, analyzing which of the twenty mutations (five per nsp) have a statistically meaningful relationship with the mutation density in the M and E genes (chosen due to being under little selective pressure), and identifying the synonymous and nonsynonymous genomic SNV density for isolates with any of the statistically meaningful mutations, as well as isolates with none of the identified mutations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
user_id assigned to each user is consistent across each of the files (i.e., test windows in test_windows.csv for user_id == 10 correspond to user_id == 10 in para.csv, info.csv, etc.).- Paradata collection was designed to operate asynchronously, ensuring that no user interactions were disrupted during data collection. As mLab was a browser-based technology, when users use browser navigation rapidly, there can be things that appear out of order (as we noted in our manuscripts).- Dataframe column datatypes have been converted to satisfy specific analyses. Please check datatypes and convert as needed for your particular needs.- Due to the sensitive nature of the survey data, the CSV/parquet files for survey are not included in this data repository. They will be made available upon reasonable request.- For detailed descriptions of the study design and methodology, please refer to the associated publication.## File Descriptions### facts.csv / facts.parquetThis file records the educational facts shown to users._Column Name_: Descriptiondisplay_timestamp: Unix timestamp when an educational fact was displayed to the user.session_id: Unique identifier for the user’s session when the fact was shown.user_id: Unique identifier for the user the fact was shown to.fact_category: Category of the educational fact displayed to the user.fact_index: Index number of the fact shown to the user.fact_text: Text of the educational fact displayed.### info.csv / info.parquetThis file contains user-specific metadata, and repeated data about each user (alerts and pinned facts)._Column Name_: Descriptionuser_id: Unique identifier for the user.redcap_repeat_instrument: REDCap field indicating the repeat instrument used. For general information about the user (userlocation and numberoflogins), redcap_repeat_instrument is blank. For repeated data (alerts, pinned facts, scheduled tests), redcap_repeat_instrument will identify the instrument.redcap_repeat_instance: Instance number of the repeat instrument (if applicable).user_location: Location of the user (if available). (1: New York City cohort; 2: Chicago cohort)alert_date: A unix timestamp of when an alert was sent to the user.number_of_logins: Total number of logins by the user.alert_subject: Subject or type of the alert sent.alert_read: Indicates whether the alert was read by the user (1: True; 0: False).end_date: Unix timestamp of the end date of scheduled tests.start_date: Unix timestamp of the start date of scheduled tests.fact_category: Category of the educational fact pinned by the user.fact_index: Index number of the fact pinned by the user.fact_text: Text of the educational fact pinned by the user.fact_link: Link to additional information associated with the fact pinned by the user (if available).### para.csv / para.parquetThis file includes paradata (detailed in-app user interactions) collected during the study._Column Name_: Descriptiontimestamp: A timezone-naive timestamp of the user action or event.session_id: Unique identifier for the user’s session.user_id: Unique identifier for the user.user_action: Specific user action (e.g., button press, page navigation). "[]clicked" indicates a pressable element (i.e., button, collapsible/expandable menu) is pressed.current_page: Current page of the app being interacted with.browser: Browser used to access the app.platform: Platform used to access the app (e.g., Windows, iOS).platform_description: Detailed description of the platform.platform_maker: Manufacturer of the platform.device_name: Name of the device used.device_maker: Manufacturer of the device used.device_brand_name: Brand name of the device used.device_type: Type of device used (Mobile, Computer, etc.).user_location: Location of the user (1: New York City cohort; 2: Chicago cohort).### survey.csv / survey.parquetThis file contains survey responses collected from users.*NOTE: Due to the sensitive nature of this data, CSV/parquet files are not included in this data repository. They will be made available upon reasonable request.*_Column Name_: Descriptionuser_id: Unique identifier for the user.timepoint: Timepoint of the survey (baseline/0 months, 6 months, 12 months).race: Race of the user.education: Education level of the user.health_literacy: Health literacy score of the user.health_efficacy: Health efficacy score of the user.itues_mean: Information Technology Usability Evaluation Scale (ITUES) mean score.age: Age of the user.### tests.csv / tests.parquetThis file contains data related to the HIV self-tests performed by users in the mLab App._Column Name_: Descriptionuser_id: Unique identifier for the user that took the test.visual_analysis_date: A unix timestamp of the visual analysis of the test by the user.visual_result: Result of the visual analysis (positive, negative).mlab_analysis_date: A unix timestamp of the analysis conducted by the mLab system.mlab_result: Result from the mLab analysis (positive, negative).signal_ratio: Ratio of the intensity of test signal to the control signal.control_signal: mLab calculated intensity of the control signal.test_signal: mLab calculated intensity of the test signal.browser: Browser used to access the app (from the User Agent string).platform: Platform used to access the app (e.g., Windows, iOS) (from the User Agent string).platform_description: Detailed description of the platform (from the User Agent string).platform_maker: Manufacturer of the platform (from the useragUser Agentent string).device_name: Name of the device used (from the User Agent string).device_maker Manufacturer of the device used (from the User Agent string).device_brand_name: Brand name of the device used (from the User Agent string).device_type: Type of device used (Mobile, Computer, etc.) (from the User Agent string).### test_windows.csv / test_windows.parquetThis file contains information on testing windows assigned to users._Column Name_: Descriptionuser_id: Unique identifier for the user.redcap_repeat_instance: Instance of the repeat instrument.start_date: Start date of the (hard) testing window.end_date: End date of the (hard) testing window.## CitationIf you use this dataset, please cite the associated mLab and mLab paradata publications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analytical reproducibility summary.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
=====================================================================
=====================================================================
Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)
Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam
National Institute of Informatics Copyright (c) 2021
Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn
Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725
=============================== INTRODUCTION ===============================
The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.
The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.
=============================== REFERENCES ===============================
If your publish using any of the data in this dataset please cite the following papers:
@article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }
@INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}
======================== DATA STRUCTURE ==================================
./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md
We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.
============================== COPYING ================================
This repository is made available under Creative Commons Attribution License (CC-BY).
Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/
THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE
====================== ACKNOWLEDGEMENTS ================================
This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.
This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: Each column represents a regression. Column (1) and (2) test the short-term (k = 1) β-convergence in RHE, and Column (3) and (4) test the short-term (k = 1) β-convergence in Mortality. Robust standard errors in parentheses, ** significant at 5%, *** significant at 1%.
Facebook
TwitterThis statistical note contains figures relating to tests and people who were tested under pillar 1 or pillar 2 of the government testing strategy.
Pillar 1 is swab testing in Public Health England (PHE) labs and NHS hospitals for those with a clinical need, and health and care workers.
Pillar 2 is swab testing for the wider population, through commercial partnerships.