Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this work we present results of all the major global models and normalise the model results by looking at changes over time relative to a common base year value. We give an analysis of the variability across the models, both before and after normalisation in order to give insights into variance at national and regional level. A dataset of harmonised results (based on means) and measures of dispersion is presented, providing a baseline dataset for CBCA validation and analysis. The dataset is intended as a goto dataset for country and regional results of consumption and production based accounts. The normalised mean for each country/region is the principle result that can be used to assess the magnitude and trend in the emission accounts. However, an additional key element of the dataset are the measures of robustness and spread of the results across the source models. These metrics give insight into the amount of trust should be placed in the individual country/region results. Code at https://doi.org/10.5281/zenodo.3181930
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Data from simulations of COVID-19 spread in Sweden under different public-health measures. Results from individual-based models.
Dataset description This dataset contains background data and supplementary material for Sönning (forthcoming), a study that looks at the behavior of dispersion measures when applied to text-level frequency data. For the literature survey reported in that study, which examines how dispersion measures are used in corpus-based work, it includes tabular files listing the 730 research articles that were examined as well as annotations for those studies that measured dispersion in the corpus-linguistic (and lexicographic) sense. As for the corpus data that were used to train the statistical model parameters underlying the simulation study reported in that paper, the dataset contains a term-document matrix for the 49,604 unique word forms (after conversion to lower-case) that occur in the Brown Corpus. Further, R scripts are included that document in detail how the Brown Corpus XML files, which are available from the Natural Language Toolkit (Bird et al. 2009; https://www.nltk.org/), were processed to produce this data arrangement. Abstract: Related publication This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behavior of measures in such distributional settings are currently lacking, however. After a thorough discussion of six prominent indices, we investigate their behavior on relevant frequency distributions, which are designed to mimic actual corpus data. Our evaluation considers different distributional settings, i.e. various combinations of frequency and dispersion values. The primary focus is on the response of measures to relatively high and low sub-frequencies, i.e. texts in which the item or structure of interest is over- or underrepresented (if not absent). We develop a simple method for constructing sensitivity profiles, which allow us to draw instructive comparisons among measures. We observe that these profiles vary considerably across distributional settings. While D and DP appear to show the most balanced response contours, our findings suggest that much work remains to be done to understand the performance of measures on items with normalized frequencies below 100 per million words.
https://dataverse.no/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.18710/FVHTFMhttps://dataverse.no/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.18710/FVHTFM
Dataset description This dataset contains background data and supplementary material for Sönning (forthcoming), a study that looks at the behavior of dispersion measures when applied to text-level frequency data. For the literature survey reported in that study, which examines how dispersion measures are used in corpus-based work, it includes tabular files listing the 730 research articles that were examined as well as annotations for those studies that measured dispersion in the corpus-linguistic (and lexicographic) sense. As for the corpus data that were used to train the statistical model parameters underlying the simulation study reported in that paper, the dataset contains a term-document matrix for the 49,604 unique word forms (after conversion to lower-case) that occur in the Brown Corpus. Further, R scripts are included that document in detail how the Brown Corpus XML files, which are available from the Natural Language Toolkit (Bird et al. 2009; https://www.nltk.org/), were processed to produce this data arrangement. Abstract: Related publication This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behavior of measures in such distributional settings are currently lacking, however. After a thorough discussion of six prominent indices, we investigate their behavior on relevant frequency distributions, which are designed to mimic actual corpus data. Our evaluation considers different distributional settings, i.e. various combinations of frequency and dispersion values. The primary focus is on the response of measures to relatively high and low sub-frequencies, i.e. texts in which the item or structure of interest is over- or underrepresented (if not absent). We develop a simple method for constructing sensitivity profiles, which allow us to draw instructive comparisons among measures. We observe that these profiles vary considerably across distributional settings. While D and DP appear to show the most balanced response contours, our findings suggest that much work remains to be done to understand the performance of measures on items with normalized frequencies below 100 per million words.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Measures of central tendency and dispersion of the analyzed variables.
https://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/ATCQZWhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/ATCQZW
This dataset contains frequencies for a set of 150 word forms in the BNC. The set of items was compiled by Biber et al. (2016) for the purpose of analyzing the behavior of dispersion measures in different distributional settings. It was therefore assembled to cover a broad range of frequency and dispersion levels. For each form, the dataset lists (i) the number occurrences in each of the 4049 text files in the BNC, including zero counts; and (ii) the length of each text file, i.e. the number of word tokens it contains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We employ a cellular-automata to reconstruct the land use patterns of cities that we characterize by two measures of spatial heterogeneity: (a) a variant of spatial entropy, which measures the spread of residential, business, and industrial activity sectors, and (b) an index of dissimilarity, which quantifies the degree of spatial mixing of these land use activity parcels. A minimalist and bottom-up approach is adopted that utilizes a limited set of three parameters which represent the forces which determine the extent to which each of these sectors spatially aggregate into clusters. The dispersion degrees of the land uses are governed by a fixed pre-specified power-law distribution based on empirical observations in other cities. Our method is then used to reconstruct land use patterns for the city state of Singapore and a selection of North American cities. We demonstrate the emergence of land use patterns that exhibit comparable visual features to the actual city maps defining our case studies whilst sharing similar spatial characteristics. Our work provides a complementary approach to other measures of urban spatial structure that differentiate cities by their land use patterns resulting from bottom-up dispersion and aggregation processes.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The guidance identifies core personal and community-based public health measures to mitigate the transmission of coronavirus disease (COVID-19).
Multiple populations are ubiquitous in the old massive globular clusters (GCs) of the Milky Way. It is still unclear how they arose during the formation of a GC. The topic of iron and metallicity variations has recently attracted attention with the measurement of iron variations among the primordial population (P1) stars of Galactic GCs. We use the spectra of more than 8000 RGB stars in 21 Galactic GCs observed with MUSE to derive individual stellar metallicities [M/H]. For each cluster, we use the HST photometric catalogs to separate the stars into two main populations (P1 and P2). We measure the metallicity spread within the primordial population of each cluster by combining our metallicity measurements with the stars {Delta}F275W,F814W pseudo-color. We also derive metallicity dispersions ({sigma}[M/H]) for the P1 and P2 stars of each GC. In all but three GCs, we measure a significant correlation between the metallicity and the {Delta}F275W,F814W pseudo-color of the P1 stars such that stars with larger {Delta_F275W,F814W_ have higher metallicities. We measure metallicity spreads that range from 0.03 to 0.24dex and correlate with the GC masses. As for the intrinsic metallicity dispersions, when combining the P1 and P2 stars, we measure values ranging from 0.02 dex to 0.08dex that correlate very well with the GC masses. We compared the metallicity dispersion among the P1 and P2 stars and found that the P2 stars have metallicity dispersions that are smaller or equal to that of the P1 stars. We find that both the metallicity spreads of the P1 stars (from the {Delta_F275W,F814W_ spread in the chromosome maps) and the metallicity dispersions ({sigma_[M/H]_) correlate with the GC masses, as predicted by some theoretical self-enrichment models presented in the literature.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Research data for the purpose of reproducing the results presented in the journal publication titled "Single-shot capable surface acoustic wave dispersion measurement of a layered plate"
Synthetic and real dispersion measurements for paths across the Pacific, consists of 2 datasets; SS3DPacific_new - This is a data set of surface-wave dispersion measurements. The dispersion is measured between a synthetic reference seismogram (computed with normal-mode summation using the MINEOS software in the radial model stw105 from Kustowski et al., 2008), and a real observed seismogram. This data set is used by Latallerie et al. (2024) to build a Vs model of the Pacific upper-mantle with full 3D resolution and uncertainty using SOLA inversion (Zaroli 2016) and finite-frequency theory (Zhou 2009). Data are for a set of source-receiver pairs for frequencies ranging from 6 to 21 mHz, every 1mHz. The measurement algorithm uses the multi-taper technique (Thompson 1982). The first 5 Slepians are used (Slepian 1978). A datum is the average of measurements over these tapers, and the uncertainty is the standard deviation. SS3DPacificSyn_new - This is a data set of surface-wave dispersion measurements. The dispersion is measured between a synthetic reference seismogram (computed with normal-mode summation using the MINEOS software in the radial model stw105 from Kustowski et al., 2008), and a synthetic seismogram computed using the spectral element method software Specfem in the 3D model S362ANI from Kustowski etl al. (2018). This data set is used by Latallerie et al. (2024) in a synthetic tomography study to retrieve the Vs structure of the input 3D model S362ANI in the Pacific upper-mantle with full 3D resolution and uncertainty using SOLA inversion (Zaroli 2016) and finite-frequency theory (Zhou 2009). Measurements are provided for source-receiver pairs for frequencies ranging from 6 to 21 mHz, every 1mHz. The measurement algorithm uses the multi-taper technique (Thompson 1982). The first 5 Slepians (Slepian 1978) are used. A datum is the average of measurements over these tapers, and the uncertainty is the standard deviation.
A number of recent studies indicates a significant amount of ionized gas in a form of the hot gas halo around the Milky Way. The halo extends over the region of 100 kpc and may be acountable for the missing baryon mass. In this paper we calculate the contribution of the proposed halo to the dispersion measure (DM) of the pulsars. The Navarro, Frenk, and White (NFW), Maller and Bullock (MB), and Feldmann, Hooper, and Gnedin (FHG) density distributions are considered for the gas halo. The data set includes pulsars with the distance known independently from the DM, e.g., pulsars in globular clusters, LMC, SMC and pulsars with known parallax. The results exclude the NFW distribution for the hot gas, while the more realisticMB and FHG models are compatible with the observed dispersion measure.
ACTIVATE-FLEXPART is the FLEXible PARTicle dispersion model back-trajectories ending at the HU-25 Falcon locations. ACTIVATE was a 5-year NASA Earth-Venture Sub-Orbital (EVS-3) field campaign. Marine boundary layer clouds play a critical role in Earth’s energy balance and water cycle. These clouds cover more than 45% of the ocean surface and exert a net cooling effect. The Aerosol Cloud meTeorology Interactions oVer the western Atlantic Experiment (ACTIVATE) project was a five-year project that provides important globally-relevant data about changes in marine boundary layer cloud systems, atmospheric aerosols and multiple feedbacks that warm or cool the climate. ACTIVATE studied the atmosphere over the western North Atlantic and sampled its broad range of aerosol, cloud and meteorological conditions using two aircraft, the UC-12 King Air and HU-25 Falcon. The UC-12 King Air was primarily used for remote sensing measurements while the HU-25 Falcon will contain a comprehensive instrument payload for detailed in-situ measurements of aerosol, cloud properties, and atmospheric state. A few trace gas measurements were also onboard the HU-25 Falcon for the measurements of pollution traces, which will contribute to airmass classification analysis. A total of 150 coordinated flights over the western North Atlantic occurred through 6 deployments from 2020-2022. The ACTIVATE science observing strategy intensively targets the shallow cumulus cloud regime and aims to collect sufficient statistics over a broad range of aerosol and weather conditions which enables robust characterization of aerosol-cloud-meteorology interactions. This strategy was implemented by two nominal flight patterns: Statistical Survey and Process Study. The statistical survey pattern involves close coordination between the remote sensing and in-situ aircraft to conduct near coincident sampling at and below cloud base as well as above and within cloud top. The process study pattern involves extensive vertical profiling to characterize the target cloud and surrounding aerosol and meteorological conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file provides the measured complex refractive indices (optical dispersions) of GST - 225 for different fractions of crystalline states, from fully amorphous to fully crystalline. This data was collected by following the crystallization in real-time (from 250 seconds to >1000 seconds), while heating GST @ 145°C.
DC3_TraceGas_AircraftInSitu_DC8_Data are in-situ trace gas data collected onboard the DC-8 aircraft during the Deep Convective Clouds and Chemistry (DC3) field campaign. Data collection for this product is complete.The Deep Convective Clouds and Chemistry (DC3) field campaign sought to understand the dynamical, physical, and lightning processes of deep, mid-latitude continental convective clouds and to define the impact of these clouds on upper tropospheric composition and chemistry. DC3 was conducted from May to June 2012 with a base location of Salina, Kansas. Observations were conducted in northeastern Colorado, west Texas to central Oklahoma, and northern Alabama in order to provide a wide geographic sample of storm types and boundary layer compositions, as well as to sample convection.DC3 had two primary science objectives. The first was to investigate storm dynamics and physics, lightning and its production of nitrogen oxides, cloud hydrometeor effects on wet deposition of species, surface emission variability, and chemistry in anvil clouds. Observations related to this objective focused on the early stages of active convection. The second objective was to investigate changes in upper tropospheric chemistry and composition after active convection. Observations related to this objective focused on the 12-48 hours following convection. This objective also served to explore seasonal change of upper tropospheric chemistry.In addition to using the NSF/NCAR Gulfstream-V (GV) aircraft, the NASA DC-8 was used during DC3 to provide in-situ measurements of the convective storm inflow and remotely-sensed measurements used for flight planning and column characterization. DC3 utilized ground-based radar networks spread across its observation area to measure the physical and kinematic characteristics of storms. Additional sampling strategies relied on lightning mapping arrays, radiosondes, and precipitation collection. Lastly, DC3 used data collected from various satellite instruments to achieve its goals, focusing on measurements from CALIOP onboard CALIPSO and CPL onboard CloudSat. In addition to providing an extensive set of data related to deep, mid-latitude continental convective clouds and analyzing their impacts on upper tropospheric composition and chemistry, DC3 improved models used to predict convective transport. DC3 improved knowledge of convection and chemistry, and provided information necessary to understanding the processes relating to ozone in the upper troposphere.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a spectral dataset of natural objects and daylights collected in Japan.
We collected 359 natural objects and measured the reflectance of all objects and the transmittance of 75 leaves. We also measured daylights from dawn till dusk on four different days using a white plate placed (i) under the direct sun and (ii) under the casted shadow (in total 359 measurements). We also separately measured daylights at five different locations (including a sports ground, a space between tall buildings and a forest) with minimum time intervals to reveal the influence of surrounding environments on the spectral composition of daylights reaching the ground (in total 118 measurements).
If you use this dataset in your research, please cite the following publication.
Dataset contains following Excel spread sheets and csv files:
(A) Surface properties of natural objects
(A-1) Reflectance_ver1-2.xlsx and .csv
(A-2) Transmittance_FrontSideUp_ver1-2.xlsx and .csv
(A-2) Transmittance_BackSideUp_ver1-2.xlsx and .csv
(B) Daylight measurements
(B-1) Daylight_TimeLapse_v1-2.xlsx and .csv
(B-2) Daylight_DifferentLocations_v1-2.xlsx and .csv
Data description
(A) Surface properties
(A-1) Reflectance_ver1-2.xlsx and .csv
This file contains surface spectral reflectance data (380 - 780 nm, 5 nm step) of 359 natural objects, including 200 flowers, 113 leaves, 23 fruits, 6 vegetables, 8 barks, and 9 stones measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.
For the analysis presented in the paper, we identified reflectance pairs that have a Pearson’s correlation coefficient across 401 spectral channels of more than 0.999 and removed one of reflectances from each pair. The column 'Used in analysis' indicates whether or not each sample is used for the analysis (TRUE indicates used and FALSE indicate not used).
At the time of collection, we noted the scientific names of flowers, leaves and barks from a name board provided by the Tokyo Institute of Technology in which samples are collected. If not available, we used a smartphone software which automatically identifies the scientific name from an input image (PictureThis - Plant Identifier developed by Glority Global Group Ltd.). The names of 2 flowers and 9 stones whose name could not be identified through either method were left blank.
(A-2) Transmittance_FrontSideUp_v1-2.xlsx and .csv
This file contains surface spectral transmittance data (380 - 780 nm, 5 nm step) for 75 leaves measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.
For this data, the transmittance was measured with the front-side of leaves up (the light was transmitted from the back side of the leaves). This is the data presented in the associated article.
(A-3) Transmittance_BackSideUp_v1-2.xlsx and .csv
Spectral transmittance data of the same leaves presented in (A-2).
For this data, the transmittance was measured with the back-side of leaves up (the light was transmitted from the front side of the leaves).
(B) Daylight measurements
(B-1) Daylight_TimeLapse_ver1-2.xlsx and .csv
This file contains daylight spectra from sunrise to sunset on four different days (2013/11/20, 2013/12/24, 2014/07/03 and 2014/10/27) measured by a spectrophotometer (SR-LEDW, Topcon, Tokyo, Japan) with a wavelength range from 380 nm to 780 nm with 1 nm step. We measured the reflected light from the white calibration plate placed either under a direct sunlight or under a casted shadow.
The column 'Cloud cover' provides visual estimate of percentage of cloud cover across the sky at the time of each measurement. The column 'Red lamp' indicates whether an aircraft warning lamp at the measurement site was on (circle) or off (blank).
(B-2) Daylight_DifferentLocations_ver1-2.xlsx and .csv
This file includes daylight spectra measured at five different sites within the Suzukakedai Campus of Tokyo Institute of Technology with minimum time gap on 2014/07/08, using a spectroradiometer (IM-1000, Topcon) from 380 nm to 780 nm with 1 nm step. The instrument was oriented either towards the sun or towards the zenith sky. When the instrument was oriented to the sun, we measured spectra in two ways: (i) one using a black cylinder covering the photodetector and (ii) the other without using a cylinder.
The column 'Cylinder' indicates whether the black cylinder was used (circle) or not (cross). The column 'Cloud cover' shows the visual estimate of percentage of cloud cover at the time of each measurement. The column 'Sun hidden in clouds' denotes whether the measurement was taken when the sun was covered by clouds (circle) or not (blank).
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The COVID-19 India Containment Zone Classification dataset categorizes Indian districts into Red, Orange, and Green Zones based on COVID-19 case metrics as of May 4. This classification aids in understanding the spread and control of COVID-19 across different regions.
2) Data Utilization (1) COVID-19 India Containment Zone data has characteristics that: • It includes detailed district-level information on the zone classification (Red, Orange, Green) based on COVID-19 metrics. This information is crucial for analyzing the spread of the virus, the effectiveness of containment measures, and for planning public health strategies. (2) COVID-19 India Containment Zone data can be used to: • Public Health Management: Assists in resource allocation, planning containment measures, and implementing targeted lockdowns based on zone classification. • Research and Analysis: Supports epidemiological studies, modeling the spread of the virus, and assessing the impact of containment measures in different zones.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - WHO
People can catch COVID-19 from others who have the virus. This has been spreading rapidly around the world and Italy is one of the most affected country.
On March 8, 2020 - Italy’s prime minister announced a sweeping coronavirus quarantine early Sunday, restricting the movements of about a quarter of the country’s population in a bid to limit contagions at the epicenter of Europe’s outbreak. - TIME
This dataset is from https://github.com/pcm-dpc/COVID-19
collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale
This dataset has two files
covid19_italy_province.csv
- Province level data of COVID-19 casescovid_italy_region.csv
- Region level data of COVID-19 casesData is collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale and is uploaded into this github repo.
Dashboard on the data can be seen here. Picture courtesy is from the dashboard.
Insights on * Spread to various regions over time * Try to predict the spread of COVID-19 ahead of time to take preventive measures
Measures taken in relation to the cat to prevent the spread of COVID-19.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates usually called 'the clones of high risk' often drive the spread of resistance within particular species.
Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and developing of spread prevention measures. However, the availability and uniformity of the data derived from the genomic sequences often represents a bottleneck for such investigations.
In this dataset, we present the results of a genomic epidemiology analysis of 61,857 genomes of a dangerous bacterial pathogen Klebsiella pneumoniae obtained from NCBI Genbank database. Important typing information including multilocus sequence typing (MLST)-based sequence types (STs), capsular (KL) and oligosaccharide (OL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to clonal groups (CG). The presence of antimicrobial resistance and virulence genes, as well as plasmid replicons, within the genomes is also reported.
These data will be useful for researchers in the field of K. pneumoniae genomic epidemiology, resistance analysis and prevention measure development.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this work we present results of all the major global models and normalise the model results by looking at changes over time relative to a common base year value. We give an analysis of the variability across the models, both before and after normalisation in order to give insights into variance at national and regional level. A dataset of harmonised results (based on means) and measures of dispersion is presented, providing a baseline dataset for CBCA validation and analysis. The dataset is intended as a goto dataset for country and regional results of consumption and production based accounts. The normalised mean for each country/region is the principle result that can be used to assess the magnitude and trend in the emission accounts. However, an additional key element of the dataset are the measures of robustness and spread of the results across the source models. These metrics give insight into the amount of trust should be placed in the individual country/region results. Code at https://doi.org/10.5281/zenodo.3181930