Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.
Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.
Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.
Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bivariate correlations for students of variables n ≥ 250.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 2
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Objective(s): The 2024 Pediatric Sepsis Data Challenge provides an opportunity to address the lack of appropriate mortality prediction models for LMICs. For this challenge, we are asking participants to develop a working, open-source algorithm to predict in-hospital mortality and length of stay using only the provided synthetic dataset. The original data used to generate the real-world data (RWD) informed synthetic training set available to participants was obtained from a prospective, multisite, observational cohort study of children with suspected sepsis aged 6 months to 60 months at the time of admission to hospitals in Uganda. For this challenge, we have created a RWD-informed synthetically generated training data set to reduce the risk of re-identification in this highly vulnerable population. The synthetic training set was generated from a random subset of the original data (full dataset A) of 2686 records (70% of the total dataset - training dataset B). All challenge solutions will be evaluated against the remaining 1235 records (30% of the total dataset - test dataset C). Data Description: Report describing the comparison of univariate and bivariate distributions between the Synthetic Dataset and Test Dataset C. Additionally, a report showing the maximum mean discrepancy (MMD) and Kullback–Leibler (KL) divergence statistics. Synthetic training dataset and data dictionary for the synthetic dataset containing 138 variables. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator at sepsiscolab@bcchr.ca or visit our website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Observed bivariate distribution of the number of times bacon and eggs were purchased on four consecutive shopping trips (see [23, 28]).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The paper considers some of the issues emerging from the discrete wavelet analysis of popular bivariate spectral quantities such as the coherence and phase spectra and the frequency-dependent time delay. The approach utilised here is based on the maximal overlap discrete Hilbert wavelet transform (MODHWT). Firstly, via a broad set of simulation experiments, we examine the small and large sample properties of two wavelet estimators of the scale-dependent time delay. The estimators are the wavelet cross-correlator and the wavelet phase angle-based estimator. Our results provide some practical guidelines for the empirical examination of short- and medium-term lead-lag relations for octave frequency bands. Further, we point out a deficiency in the implementation of the MODHWT and suggest using a modified implementation scheme, which was proposed earlier in the context of the dual-tree complex wavelet transform. In addition, we show how MODHWT-based wavelet quantities can serve to approximate the Fourier bivariate spectra and discuss issues connected with building confidence intervals for them. The discrete wavelet analysis of coherence and phase angle is illustrated with a scale-dependent examination of business cycle synchronisation between 11 euro zone countries. The study is supplemented by a wavelet analysis of the variance and covariance of the euro zone business cycles. The empirical examination underlines the good localisation properties and high computational efficie ncy of the wavelet transformations applied and provides new arguments in favour of the endogeneity hypothesis of the optimum currency area criteria as well as the wavelet evidence on dating the Great Moderation in the euro zone.
Business Problem: We all know that Health care is very important domain in the market. It is directly linked with the life of the individual; hence we have to be always be proactive in this particular domain. Money plays a major role in this domain, because sometime treatment becomes super costly and if any individual is not covered under the insurance then it will become a pretty tough financial situation for that individual. The companies in the medical insurance also want to reduce their risk by optimizing the insurance cost, because we all know a healthy body is in the hand of the individual only. If individual eat healthy and do proper exercise the chance of getting ill is drastically reduced. Goal & Objective: The objective of this exercise is to build a model, using data that provide the optimum insurance cost for an individual. You have to use the health and habit related parameters for the estimated cost of insurance
Review Parameters Review points
1) Introduction of the business problem
a) Defining problem statement
b) Need of the study/project
c) Understanding business/social opportunity
2)Data Report
a) Understanding how data was collected in terms of time, frequency and methodology
b) Visual inspection of data (rows, columns, descriptive details)
c) Understanding of attributes (variable info, renaming if required)
3) Exploratory data analysis
a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for categorical ones)
b) Bivariate analysis (relationship between different variables , correlations)
a) Removal of unwanted variables (if applicable)
b) Missing Value treatment (if applicable)
d) Outlier treatment (if required)
e) Variable transformation (if applicable)
f) Addition of new variables (if required)
4) Business insights from EDA
a) Is the data unbalanced? If so, what can be done? Please explain in the context of the business
b) Any business insights using clustering (if applicable)
c) Any other business insights
Pooling individual samples prior to DNA extraction can mitigate the cost of DNA extraction and genotyping; however, these methods need to accurately generate equal representation of individuals within pools. This data set was generated to determine accuracy of pool construction based on white blood cell counts compared to two common DNA quantification methods. Fifty individual bovine blood samples were collected, and then pooled with all individuals represented in each pool. Pools were constructed with the target of equal representation of each individual animal based on number of white blood cells, spectrophotometric readings, spectrofluorometric readings and whole blood volume with 9 pools per method and a total of 36 pools. Pools and individual samples that comprised the pools were genotyped using a commercially available genotyping array. ASReml was used to estimate variance components for individual animal contribution to pools. The correlation between animal contributions between two pools was estimated using bivariate analysis with starting values set to the result of a univariate analysis. The dataset includes: 1) pooling allele frequencies (PAF) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); PAF = X/(X+Y). 2) Genotypes or number of copies of B(green) allele (0,1,2). 3) Definitions for each sample. Resources in this dataset:Resource Title: Pooling Allele Frequencies (paf) for all pools and individual animals. File Name: pafAnimal.csv.gzResource Description: Pooling Allele Frequencies (paf) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); paf = X / (X + Y)Resource Title: Genotypes for individuals within pools. File Name: g.csv.gzResource Description: Genotypes (number of copies of the B (green) allele (0,1,2)) for individual bovine animals within pools.Resource Title: Sample Definitions . File Name: XY Data Key.xlsxResource Description: Definitions for each sample (both pools and individual animals).
This data package was produced by researchers working on the Shortgrass Steppe Long Term Ecological Research (SGS-LTER) Project, administered at Colorado State University. Long-term datasets and background information (proposals, reports, photographs, etc.) on the SGS-LTER project are contained in a comprehensive project collection within the Digital Collections of Colorado (http://digitool.library.colostate.edu/R/?func=collections&collection_id=3429). The data table and associated metadata document, which is generated in Ecological Metadata Language, may be available through other repositories serving the ecological research community and represent components of the larger SGS-LTER project collection. CPER Paleopedology Study – Particle and Grain Size - Grain size data from 39 pedons were compared with modal fluvial (7) and eolian (3) samples in order to characterize the origin of CPER parent materials and distinguish the origin of CPER geomorphic features. The seven fluvial sites were located along Owl and Eastman Creeks. The three eolian sites were located on the nearest undisputed dune fields located approximately 5 km north of Roggen, CO (Muhs, 1985). For statistical analysis, the sand and coarse silt fractions were shaken in a nest of half phi(0) interval sieves ranging from -1.0 0 (10 mesh) to 4.5 0 (325 mesh) for 3 minutes. Phi intervals (-log2) were utilized to normalize the particle size data for use in conventional statistics (Krumbein, 1934). The silt and clay fractions were separated by sedimentation using the pipette method. Statistical methods adopted from Folk and Ward (1957) were applied to the -1.0 0 to 7.0 0 fractions using the Sedimentary Petrology Computer Program SEDPET (Warner, 1970) to determine mean grain size (Mz), sorting (Iz), skewness (Skz), and kurtosis (Kz). These parameters were then subjected to univariate and bivariate analysis. The clay fraction was not included in the statistical computations to avoid excessively fine skewing the sample. Additional information and referenced materials can be found: http://hdl.handle.net/10217/85625. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-sgs&identifier=168 Webpage with information and links to data files for download
This study was conducted to address the dropping rates in residential placements of adjudicated youth after the 1990s. Policymakers, advocates, and reseraches began to attirbute the decline to reform measures and proposed that this was the cause of the drop seen in historic national crime. In response, researchers set out to use state-level data on economic factors, crime rates, political ideology scores, and youth justice policies and practices to test the association between the youth justice policy environment and recent reductions in out-of-home placements for adjudicated youth. This data collection contains two files, a multivariate and bivariate analyses. In the multivariate file the aim was to assess the impact of the progressive policy characteristics on the dependent variable which is known as youth confinement. In the bivariate analyses file Wave 1-Wave 10 the aim was to assess the states as they are divided into 2 groups across all 16 dichotomized variables that comprised the progressive policy scale: those with more progressive youth justice environments and those with less progressive or punitive environments. Some examples of these dichotomized variables include purpose clause, courtroom shackling, and competency standard.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Conventional research methodologies and data analytic approaches in psychiatric research are unable to reliably infer causal relations without experimental designs, or to make inferences about the functional properties of the complex systems in which psychiatric disorders are embedded. This article describes a series of studies to validate a novel hybrid computational approach–the Complex Systems-Causal Network (CS-CN) method–designed to integrate causal discovery within a complex systems framework for psychiatric research. The CS-CN method was first applied to an existing dataset on psychopathology in 163 children hospitalized with injuries (validation study). Next, it was applied to a much larger dataset of traumatized children (replication study). Finally, the CS-CN method was applied in a controlled experiment using a ‘gold standard’ dataset for causal discovery and compared with other methods for accurately detecting causal variables (resimulation controlled experiment). The CS-CN method successfully detected a causal network of 111 variables and 167 bivariate relations in the initial validation study. This causal network had well-defined adaptive properties and a set of variables was found that disproportionally contributed to these properties. Modeling the removal of these variables resulted in significant loss of adaptive properties. The CS-CN method was successfully applied in the replication study and performed better than traditional statistical methods, and similarly to state-of-the-art causal discovery algorithms in the causal detection experiment. The CS-CN method was validated, replicated, and yielded both novel and previously validated findings related to risk factors and potential treatments of psychiatric disorders. The novel approach yields both fine-grain (micro) and high-level (macro) insights and thus represents a promising approach for complex systems-oriented research in psychiatry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports a meta-analytic structural equation modelling (MASEM) study investigating the factors influencing students’ behavioural intention to use educational AI (EAI) technologies. The research integrates constructs from the Technology Acceptance Model (TAM), Theory of Planned Behaviour (TPB), and Artificial Intelligence Literacy (AIL), aiming to resolve inconsistencies in previous studies and improve theoretical understanding of EAI technology adoption.
Research Hypotheses The study hypothesized that: Students’ behavioural intention (INT) to use EAI technologies is influenced by perceived usefulness (PU), perceived ease of use (PEU), attitude (ATT), subjective norm (SN), and perceived behavioural control (PBC), as described in TAM and TPB. AI literacy (AIL) directly and indirectly predicts PU, PEU, ATT, and INT. These relationships are moderated by contextual factors such as academic level (K–12 vs. higher education) and regional economic development (developed vs. developing countries).
What the Data Shows The meta-analytic dataset comprises 166 empirical studies involving over 69,000 participants. It includes pairwise Pearson correlations among seven constructs (PU, PEU, ATT, SN, PBC, INT, AIL) and is used to compute a pooled correlation matrix. This matrix was then used to test three models via MASEM: A baseline TAM-TPB model, An internal-extended model with additional TPB internal paths, An AIL-integrated extended model. The AIL-integrated model achieved the best fit (CFI = 0.997, RMSEA = 0.053) and explained 62.3% of the variance in behavioural intention.
Notable Findings AI literacy (AIL) is the strongest predictor of intention to use EAI technologies (Total Effect = 0.408). PU, ATT, and SN also significantly influence intention. The effect of PEU on intention is fully mediated by PU and ATT. Moderation analysis showed that the relationships differ between developed and developing countries and between K–12 and higher education populations.
How the Data Can Be Interpreted and Used The dataset includes bivariate correlations between variables, publication metadata, sample sizes, coding information, and reliability values (e.g., CR scores). Suitable for replication of MASEM procedures, moderation analysis, and meta-regression. Researchers may use it to test additional theoretical models or assess the influence of new moderators (e.g., AI tool type). Educators and policymakers can leverage insights from the meta-analytic results to inform AI literacy training and technology adoption strategies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FOR POLISH SEE BELOWThe data collection contains selected evidence from the quantitative workpackage of the research project Information technologies in public policy. Critical analysis of the profiling the unemployed in Poland. The project received funding from the National Science Centre (2016/23/B/HS5/00889) and was based at the Faculty of Philosophy and Sociology (now Faculty of Sociology) of University of Warsaw. The research team, led by Dr Karolina Sztandar-Sztanderska, included Dr Michał Kotnarowski, Dr Marianna Zieleńska, Alicja Palęcka, Dr Barbara Godlewska-Bujok, Dr Jędrzej Niklas, Dr Joanna Mazur. The quantitative research was carried out by the Dyspersja company: Anna Chrościcka, Tomasz Płachecki, Mikołaj Bendyk. The works on questionnaire were led by Karolina Sztandar-Sztanderska with a significant input by Michał Kotnarowski, Alicja Palęcka and the Dyspersja company. The sampling method was designed by Michał Kotnarowski.Additionaly, we make available the results of a bivariate analysis based on this dataset, performed within the project AUTO-WELF: Automating Welfare - Algorithmic Infrastructures for Human Flourishing in Europe at Institute of Philosophy and Sociology of Polish Academy of Sciences (IFiS PAN). AUTO-WELF project is supported by National Science Centre, Poland (grant no. 2021/03/Y/HS5/00263) under CHANSE ERA-NET Co-fund programme, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 101004509. The analysis was designed by Dr Karolina Sztandar-Sztanderska and performed by Dr Tomasz Żółtak.PROJECT DESCRIPTIONThe contemporary state relies increasingly on information technologies (IT) that automate decision-making. Public institutions collect data on many aspects of our lives, such as paid work, family life, or health conditions. IT enables the processing of that information on a scale not encountered before. As a result, automated decision-making systems (ADM) have become an inherent part of the policy-making process and are often believed to be scientific as they use statistical models and algorithms that are difficult for the general public to understand. No matter how sophisticated IT appears to be, the automation of the state also entails risks, which we have tried to point out in this project by analyzing a specific example of IT that was applied in the Polish public employment services (PES, in Polish: powiatowe urzędy pracy) between 2014–2019 to measure the 'employability' of unemployed people and inform decision-making on the distribution of active labour market policies by the PES.The aim of the project was to obtain empirically grounded knowledge on the development process of the profiling algorithm, the principles of its operation (Sztandar-Sztanderska, Kotnarowski, Zieleńska 2021), values and norms underlying it (Sztandar-Sztanderska, Zieleńska 2020) and the ways in which it was used in administrative practice by frontline staff in the PES (Sztandar-Sztanderska, Zieleńska 2018, 2022; Palęcka, Sztandar-Sztanderska 2024), and the ways in which it was assessed by state control bodies (e.g. the ombudsman, audit office, data protection authority). The results of this research provided insights into the new risks emerging from the use of automated decision-making systems in public policy concerning democracy, rule of law and social and labour rights (Godlewska-Bujok 2020, 2021) .The research was interdisciplinary, conducted by sociologists and legal scholars, combining qualitative and quantitative methods and taking into account the perspectives of different actors implicated in algorithmic profiling. In the work package concerning the development of the profiling algorithm, we conducted an analysis of regulations and formal documents (see open data repository: https://doi.org/10.18150/PRGRH1), as well as individual in-depth interviews with actors involved in the design of the algorithm and the legislative process, and with representatives of state control bodies that could assess the compliance of IT with the law. We also analysed the algorithm itself and the statistical model, comparing them with profiling models used in other countries (Sztandar-Sztanderska, Kotnarowski, Zieleńska 2021). Finally, we conducted in-depth case studies in four selected local PES (Sztandar-Sztanderska, Zieleńska 2018, 2022; Palęcka, Sztandar-Sztanderska 2024) and a representative quantitative survey of PES employees that we publish in this open data repository.In PADS open data repository we make available:the data set from CAWI/CATI representative survey conducted in 190 PES (in Polish)the codebook containing: basic information concerning survey (in Polish), survey (in English and Polish), descriptive statisticsresults of bivariate analysis on the caseworkers oversight over profiling algorithm (in English and Polish).TECHNOLOGIE INFORMACYJNE W POLITYCE PUBLICZNEJ. KRYTYCZNA ANALIZOWANIA PROFILOWANIA BEZROBOTNYCH W POLSCEZbiór zawiera wybrane materiały z ilościowego modułu projektu badawczego Technologie informacyjne w polityce publicznej. Krytyczna analiza profilowania bezrobotnych. Projekt uzyskał finansowanie Narodowego Centrum Nauki (2016/23/B/HS5/00889) i był realizowany na Uniwersytecie Warszawskim na Wydziale Filozofii i Socjologii (obecnie Wydział Socjologii). Zespół badawczy, kierowany przez dr Karolinę Sztandar-Sztanderską, współtworzyli: dr Marianna Zieleńska, dr Michał Kotnarowski, Alicja Palęcka, dr Jędrzej Niklas, dr Barbara Godlewska-Bujok, dr Joanna Mazur. Badanie ilościowe realizowała firma Dyspersja: Anna Chrościcka (kier.), Tomasz Płachecki (kier.), Mikołaj Bendyk. Pracami nad kwestionariuszem kierowała Karolina Sztandar-Sztanderska, a wkład w kwestionariusz mieli też Michał Kotnarowski, Alicja Palęcka i firma Dyspersja. Dobór próby zaprojektował Michał Kotnarowski.Jako oddzielny plik udostępniamy powiązane ze zbiorem danych wyniki analizy dwuzmiennowej wykonanej w ramach projektu AUTO-WELF: Automatyzacja polityk społecznych - infrastruktury algorytmiczne na rzecz dobrobytu w Europie w Instytucie Filozofii i Socjologii Polskiej Akademii Nauk. Analiza została zaprojektowana przez dr Karolinę Sztandar-Sztanderską i wykonana przez dr Tomasza Żółtaka.Opis projektuWspółczesne państwo coraz częściej wykorzystuje nowoczesne technologie informacyjne (IT) do automatyzowania procesu podejmowania decyzji. Instytucje publiczne gromadzą dane o wielu wymiarach naszego życia, a IT pozwalają przetwarzać te informacje na niespotykaną wcześniej skalę. IT są wykorzystywane przy projektowaniu polityki publicznej w oparciu o – zdawałoby się – naukowe przesłanki i metody. Z cyfryzacją państwa i automatyzacją wiążą się jednak zagrożenia, które w tym projekcie staraliśmy się uchwycić, analizując specyficzny przypadek algorytmu profilowania bezrobotnych używanego w latach 2014 -2019 do pomiaru potencjału zatrudnieniowego osób bezrobotnych i wspomagania decyzji dotyczących dystrybucji aktywnych programów rynku pracy w powiatowych urzędach pracy (PUP).Podstawowym celem projektu było uzyskanie ugruntowanej empirycznie wiedzy o procesie tworzenia algorytmu profilowania, zasadach jego działania oraz sposobach wykorzystania w praktyce administracyjnej, a także działaniach podejmowanych przez państwowe organy kontrolne. Wyniki tego badania pozwoliły uwidocznić szereg wyzwań, jakie dla demokratycznego państwa prawa niesie wykorzystanie tego rodzaju IT w polityce publicznej.Projekt miał charakter interdyscyplinarny, łączył podejście socjologiczne z prawnym, oraz jakościowe i ilościowe metody badawcze, a także uwzględniał perspektywę różnych aktorów. W części dotyczącej procesu tworzenia algorytmu profilowania przeprowadziliśmy analizę przepisów i dokumentów formalnych (dostępnych w innym repozytorium https://doi.org/10.18150/PRGRH1) oraz indywidualne wywiady pogłębione z aktorami zaangażowanymi w projektowanie instrumentu i proces legislacyjny oraz przedstawicielami instytucji publicznych, które mogłyby dokonywać oceny zgodności zasad działania IT z prawem (NIK, RPO, GIODO). Analizie poddaliśmy też samą technologię oraz procedury statystyczne, porównując polski wariant profilowania z modelami stosowanymi za granicą (Sztandar-Sztanderska, Kotnarowski, Zieleńska 2021). W ostatniej części przeprowadziliśmy pogłębione studia przypadku w czterech celowo dobranych powiatach oraz reprezentatywne badanie ilościowe z pracownikami PUP, którego wyniki publikujemy w tym zbiorze danych.Zbiór danychW Polskim Archiwum Danych Społecznych (PADS) udostępniamy następujące materiały pochodzące z reprezentatywnego badania ilościowego urzędów pracy:baza danych, plik SPSS .sav (w wersji polskiej)dokumentacja badania, zawierająca m.in. podstawowe informacje na temat badania ilościowego, ankietę (w wersji polskiej i w tłumaczeniu na angielski), rozkłady liczebności i częstości, statystyki deskryptywnewyniki analizy dwuzmiennowej dotyczące czynników wpływających na sprawowanie nadzoru pracowników PUP nad generowanymi przez algorytm wynikami profilowania (tzw. human oversight over ADM, human in the loop) (w wersji polskiej i w tłumaczeniu na angielski)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous genome-wide association studies on anthropometric measurements have identified more than 100 related loci, but only a small portion of heritability in obesity was explained. Here we present a bivariate twin study to look for the genetic variants associated with body mass index and waist-hip ratio, and to explore the obesity-related pathways in Northern Han Chinese. Cholesky decomposition model for 242 monozygotic and 140 dizygotic twin pairs indicated a moderate genetic correlation (r = 0.53, 95%CI: 0.42–0.64) between body mass index and waist-hip ratio. Bivariate genome-wide association analysis in 139 dizygotic twin pairs identified 26 associated SNPs with p < 10−5. Further gene-based analysis found 291 nominally associated genes (P < 0.05), including F12, HCRTR1, PHOSPHO1, DOCK2, DOCK6, DGKB, GLP1R, TRHR, MMP1, GPR55, CCK, and OR2AK2, as well as 6 enriched gene-sets with FDR < 0.05. Expression quantitative trait loci analysis identified rs2242044 as a significant cis-eQTL in both the normal adipose-subcutaneous (P = 1.7 × 10−9) and adipose-visceral (P = 4.4 × 10−15) tissue. These findings may provide an important entry point to unravel genetic pleiotropy in obesity traits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Objective: to evaluate the prevalence of halitosis and associated factors in institutionalized elderly persons. Methods: a sectional study was performed with 268 elderly persons from 11 long-term care institutions in Natal in the northeast of Brazil. Data collection included an oral epidemiologic examination and questions about self-perception of oral health, as well as a consultation of medical records and the application of a questionnaire to the directors of the institutions. Halitosis was measured using the organoleptic test. The independent variables were oral, sociodemographic, institutional, general health and functional conditions. Bivariate analysis was performed using the Pearson chi-square test and Fisher's exact test, and the magnitude of effect was verified by the prevalence ratio for the independent variables in relation to the outcome, with a 95% confidence level. Results: the prevalence of halitosis was 26.1%, which was exhaled by the mouth in 98.57% of cases and by the nose in 10% of cases. Prevalence was 43% higher among non-white individuals (p=0.006); 65% higher among those living in non-profit institutions (p=0.039); 52% higher in elderly persons with oriented cognitive status (p=0.047); 41% higher in elderly persons with root caries (p=0.029); 62% higher in those who did not use dentures (p=0.046); 57% lower in edentulous persons (p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The field of community ecology is evolving rapidly as researchers are able to tie functions of systems to variation in taxa. In inferring processes, functions, and causal taxa, common practice is to assume a ‘core’ community can be defined. The core refers to a group of taxa found across samples, and statistically, is the discretization or categorization of continuous data. Assuming thresholds in abundance exist, and that a core microbiome exists, has the potential to be misleading. Rather, the existence of a core set of taxa should be treated as a hypothesis with support from empirical observations. An additional challenge is that there is no standard set of criteria for core membership. Consequently, comparison across studies is often impossible. We considered four common methods for defining a core and applied them to 25 simulations that cover a range of plausible communities and two published microbial data sets. Next, we used hierarchical clustering and bivariate plots of mean taxon abundance and variance to evaluate each method. Assignment of taxa to the core varied substantially among methods. Across simulations and published data sets, hierarchical clustering of taxa based on their abundance and prevalence (variation) offered no support for a core set of taxa. The categorization of taxa into sets corresponding to a core community and other taxa has the potential to be misleading. Given that the concept of core communities received poor support from data, the concept is questionable and should not be used without testing its validity in any particular context.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Discrete character-taxon matrices are increasingly being used in an attempt to understand the pattern and tempo of morphological evolution; however, methodological sophistication and bespoke software implementations have lagged behind. In the present study, an attempt is made to provide a state-of-the-art description of methodologies and introduce a new R package (Claddis) for performing foundational disparity (morphologic diversity) and rate calculations. Simulations using its core functions show that: (1) of the two most commonly used distance metrics (Generalized Euclidean Distance and Gower's Coefficient), the latter tends to carry forward more of the true signal; (2) a novel distance metric may improve signal retention further; (3) this signal retention may come at the cost of pruning incomplete taxa from the data set; and (4) the utility of bivariate plots of ordination spaces are undermined by their frequently extremely low variances. By contrast, challenges to estimating morphologic tempo are presented qualitatively, such as how trees are time-scaled and changes are counted. Both disparity and rates deserve better time series approaches that could unlock new macroevolutionary analyses. However, these challenges need not be fatal, and several potential future solutions and directions are suggested.
Usage Notes Matrix used for the tutorialtutorial_matrix.nexAges file for the tutorial data settutorial_ages.txtR code for the tutorialtutorial_code.rR code used for the simulationssimulation_code.r
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
The central research question is: “In what socio-economic contexts linked to energy vulnerability collective action initiatives (CAIs) are distributed at the sub-regional level, and does this vary by type and scale of energy infrastructure?”. The study pursues three objectives: (1) allocate and map CAIs from knowledge inventories and local energy infrastructure configurations; (2) conduct evidence-based spatial analysis connecting socio-economic and energy vulnerability-related indicators with CAIs’ local energy configurations in Spain (3) apply parallel statistical normalization and standardization methods to compare and interpret energy vulnerability context across mapped units of Spanish municipalities.
The data represent mined, mapped and cleaned tabular data focusing on 8,131 municipalities across 52 provinces within Spain's 17 Autonomous Communities, used for aggregated spatial analysis. All_Spain_merged_table.csv contains: - socio-economic indicators referred to energy vulnerability as energy justice analytical application [Atlas de distribución de renta de los hogares 2021; Population and Housing Census 2021; Atlas de la edificación residencial2011]; - allocated Collective Action Initiatives (CAIs) from ENBP All-European Inventory database [https://doi.org/10.18710/2CPQHQ]; and - municipal electricity energy generation mix, that has been processed from the Spanish public administrative register of electric power production facilities of Spain https://energia.serviciosmin.gob.es/Pretor/.
Simplified_categories_of REGISTER_by_Real Decreto 413_2014.xls: - categories explained used to codify exported data from the administrative register of electric power production facilities.
CAIs_and_Indicators_Correlation_Table.html: - bivariate correlation analysis matrix of collected indicators for CAIs subsample.
Note: the 60M indicator’s (share of population below 60% of median income) limited data coverage (available for 49% of all municipalities and 77% of those with CAIs) restricted further analysis and was excluded from further mapping of categorized energy vulnerability based on SVI. Data availability for economic indicators varied, ranging from 82% (Gini index, mean income per household) to 99% (mean income per person); thus, missing values were estimated using linear regression (starting from version 3) to ensure a more complete dataset for analysis. Earlier versions of the CAIs_Sample.csv file contained discrepancies: missing economic indicator values (Gini and income) were incorrectly set to zero instead of treated as missing, and PV indicators showed shifted values across municipalities compared to the reference dataset. In version 4, missing values for all indicators were properly estimated using linear regression. The full-country sample file remained the reference throughout all versions. These issues have now been corrected, with limited indicators adjusted as described above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes.