62 datasets found

Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD...
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eliza Jaeger (2024). Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen [Dataset]. http://doi.org/10.5061/dryad.mpg4f4r89
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mpg4f4r89
Dataset updated
Oct 10, 2024
Dataset provided by
Columbia University
Authors
Eliza Jaeger
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
In this study, we analyzed the efficiency of different adeno-associated viral (AAV) injections in transfecting neurons in the salamander Pleurodeles waltl. To query sources of variation in AAV injection outcomes, we analyzed metadata and outcomes for an additional 68 intraparenchymal injections performed 39 in post-metamorphic Pleurodeles salamanders. This dataset included both quantitative variables (age, weight, viral genomes (v.g.) injected, and a cell transduction score ranging from 0 to 4, see STAR Methods) and categorical variables (here referred to as qualitative variables: serotype, promoter, reporter, single vs. dual injection, manufacturer, and injection site). To assess the associations between these variables, we performed a Factor Analysis for Mixed Data (FAMD), a principal component method that is designed to determine significant sources of variability within datasets that contain both quantitative and qualitative data types. This dataset contains the injection outcomes and metadata for AAV injections administered in the salamander Pleurodeles waltl. The RactoMineR package (https://CRAN.R-project.org/package=FactoMineR) was used to determine significant contributions of a number of variables contributing to injection outcomes. Analysis of these data revealed that, among other factors, age co-varies with injection score. Therefore, we conclude that increased animal age decreases the efficacy of this tool.
Differential Privacy Challenge - Sprint 3
kaggle.com
Updated Sep 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jim King (2021). Differential Privacy Challenge - Sprint 3 [Dataset]. https://www.kaggle.com/jimking100/differential-privacy-challenge-sprint-3/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 6, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jim King
Description
Context

Differential Privacy Temporal Map Challenge - Sprint 3 Data

Content

The dataset includes quantitative and categorical information about taxi trips in Chicago, including time, distance, location, payment, and service provider. The data includes several features along with time segments (trip_day_of_week and trip_hour_of_day), map segments (pickup_community_area and dropoff_community_area), and simulated individuals (taxi_id).

Acknowledgements

The data was provided by NIST PSCR for Sprint 3 of the Differential Privacy Temporal Map Challenge.

Inspiration

The data can be used to test solutions in the differential privacy field.
E
Data from: States and International Criminal Justice: COST CA18228 Scoping...
find.data.gov.scot
dtechtive.com
csv, pdf, txt
Updated Nov 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. School of Law (2023). States and International Criminal Justice: COST CA18228 Scoping Survey (version 2) [Dataset]. http://doi.org/10.7488/ds/7536
Explore at:
csv(0.0082 MB), pdf(0.2284 MB), txt(0.0166 MB), txt(0.0025 MB), pdf(2.284 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/7536
Dataset updated
Nov 6, 2023
Dataset provided by
University of Edinburgh. School of Law
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED STATES
Description
The data consists of two elements, both derived from a survey developed and administered through the EU Cost Action CA18228 (Global Atrocity Justice Constellations). The first element is made up of quantitative and categorical data; the second of qualitative text responses. The survey seeks to record and measure different elements of states' engagement with international criminal justice, including the integration of relevant provisions to domestic law; cooperation with and support for international and hybrid courts; various policy measures around prosecution of crimes defined in international law, and for the support of victims of such crimes; domestic prosecutions; NGO activity; and memorialisation, museums and other cultural activities. The survey covers 23 countries in Africa, the Americas, Asia and Europe. In each case, data was provided by individual scholars or teams of scholars coming from, working in, or working on the country in question. In some instances additional support was provided from NGOs or governmental agencies.
n
Data from: Prevalence and individual level enablers and barriers for...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waqo Boru; George Makalliwa; Caroline Musita (2024). Prevalence and individual level enablers and barriers for COVID-19 vaccine uptake among adult tuberculosis patients attending selected clinics in Nairobi County, Kenya [Dataset]. http://doi.org/10.5061/dryad.zcrjdfnms
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zcrjdfnms
Dataset updated
Jun 24, 2024
Dataset provided by
Jomo Kenyatta University of Agriculture and Technology
Authors
Waqo Boru; George Makalliwa; Caroline Musita
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Nairobi, Kenya
Description
Although vaccination is a cost-effective, equitable, and impactful public health intervention in curbing the spread of infectious disease, low uptake is a significant concern, especially among high-risk population groups. Nearly half of the population is unvaccinated in Nairobi, yet there is a shortage of vaccination information on vulnerable tuberculosis (TB) patients. The interplay of factors influences uptake, and protecting this vulnerable group and the general population from severe disease, hospitalization, and deaths is worthy. The purpose of this study is to determine the prevalence and individual-level enablers and barriers to COVID-19 vaccine uptake among adult TB patients attending selected clinics in Nairobi County, Kenya. This cross-sectional mixed-method study was conducted at TB clinics across six sub-counties in Nairobi County. It included 388 participants sampled from each clinic’s TB register. Quantitative data was collected using a questionnaire, and qualitative data was collected through key informant interviews and focus group discussions. Quantitative data was analyzed using descriptive statistics (frequencies and percentages for categorical variables and mean standard deviation for continuous variables) and inferential statistics (logistic regression). Qualitative data was analyzed through deductive coding and thematic analysis. The prevalence of COVID-19 vaccination was 46.1%, with 38.1% receiving complete vaccination. Mistrust in vaccine management (adjusted odds ratio (aOR)= 0.075, 95% confidence interval (CI): 0.025-0.229, p <0.001) was a significant barrier to COVID-19 vaccine uptake. Perceived covid-19 susceptibility (aOR = 2.901, 95% CI: 1.258-6.688, p = 0.012) and perceived covid-19 seriousness (aOR = 3.294, 95% CI: 1.130-9.604, p = 0.029) were significant enablers of COVID-19 vaccine uptake. Qualitative themes related to individual-level barriers and enablers of COVID-19 vaccine uptake were fear of side effects, stigma, myths, and mistrust in the messaging for barriers and desire to protect others and risk perception as enablers. The study revealed critical individual-level factors related to COVID-19 vaccine uptake. Methods Design and setting The study was an analytical cross-sectional study based on a mixed method (quantitative and qualitative) approach. Research assistants started with collecting quantitative data and later assisted investigators in conducting the interviews. The study was conducted at six sites in Nairobi County, Kenya. Sample and sampling The study targeted adult TB patients receiving treatment between certain months in 2023, EPI logisticians, and TB coordinators. Those eligible consented participants were helped to fill out the electronic questionnaires and focus group discussions (FGD) with different participants later. The EPI logisticians and TB coordinators were the main targets for key informant interviews. The study excluded newly diagnosed TB patients or those initiated on TB treatment on the day of data collection, patients who refused or were unable to state their vaccination status, and patients who were too unwell to participate. The Cochran formula was used to calculate the sample size. Proportion sampling (PPS) was used to allocate proportions to health facilities in Nairobi County. Purposive sampling was employed to identify facilities with high TB patient numbers across the six sub-counties. Proportion sampling (PPS) was used to allocate proportions to each health facility. Simple random sampling was used to select participants for the FGDs, and purposive sampling for the Key Informant Interview (KII). Data collection tools A semi-structured electronic questionnaire was used to collect quantitative data. It had sections A (social demographic information and vaccination details) and B (individual-level factors and COVID-19 vaccine uptake). Pre-testing of the tool was done using 12 TB patients at [redacted] which assisted in the re-ordering of questions, replacement of ambiguous words, and kobo installations on research assistants' phones. This site had population characteristics similar to those of the study sites. A KII guide and an FGD guide were used to collect qualitative data. Both guides have open-ended questions focusing on themes related to the study objectives. Data collection Eight research assistants administered the semi-structured electronic questionnaire face-to-face with participants during their clinic visits. Two research assistants were allocated to [redacted] and [redacted] because of the number of participants, and one research assistant was assigned to the other four facilities. Participants provided informed consent before answering the questionnaire. Each research assistant had their smartphone with the questionnaire accessed via the Kobo collect application and responses were entered as provided by the participants. Six KIIs were conducted with managers and leaders in charge of the selected TB clinics, and six FGDs were held with adult TB patients from the six TB clinics. A series of open-ended questions guided the KIIs and FGDs. The interviews and FDGs took between 20-60 minutes and were audio-recorded. The verbatim recordings were manually transcribed and organized for a thematic analysis approach. The results of the interviews were triangulated with quantitative data.
a
Modeled Estimates of Altered Hydrologic Metrics for All NHDPlus v21 Reaches...
hub.arcgis.com
data.chesapeakebay.net
+1more
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chesapeake Geoplatform (2025). Modeled Estimates of Altered Hydrologic Metrics for All NHDPlus v21 Reaches in the Chesapeake Bay Watershed [Dataset]. https://hub.arcgis.com/documents/b682ddc8aad54de8baa43bac95c6caeb
Explore at:
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Chesapeake Geoplatform
Area covered
Chesapeake Bay
Description
Open the Data Resource: https://doi.org/10.5066/P96SAEXZ Data are modeled estimates of flow status (inclined, diminished or indeterminant) for 12 published hydrologic metrics (HMs) that characterize main components of flow regimes (duration, frequency, magnitude, timing and rate of change). Model estimates came from random forest models independently built for each HM that predict flow status category using drainage area and previously summarized upstream catchment accumulated values (NHDPlus v2.1, 1:100,000 scale) for 15 landscape variables that describe anthropogenic stress related to urban development, agriculture, water usage and augmentation, and drainage area. HM observed data came from Eng et al. (2019) who published quantitative and categorical estimates of hydrologic alteration for each of twelve HMs for 3,355 USGS gages across the contiguous United States. Estimates included in this data release were based on a subset of 1,235 gages that were located in four aggregated Level III ecoregions within the Chesapeake Bay watershed. See Maloney et al. for a more detailed description of model approach, background data, results and application to a biological endpoint
d
Data from: An updated life history scheme for marine fishes predicts...
datadryad.org
data.niaid.nih.gov
zip
Updated Dec 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colleen Petrik; Fernando González Taboada; Charles Stock; Jorge Sarmiento (2021). An updated life history scheme for marine fishes predicts recruitment variability and sensitivity to exploitation [Dataset]. http://doi.org/10.5061/dryad.79cnp5htj
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.79cnp5htj
Dataset updated
Dec 23, 2021
Dataset provided by
Dryad
Authors
Colleen Petrik; Fernando González Taboada; Charles Stock; Jorge Sarmiento
Time period covered
Dec 21, 2020
Description
Aim: Patterns of population renewal in marine fishes are often irregular and lead to volatile fluctuations in abundance that challenge management and conservation efforts. Here, we examine the relationship between life history strategies and recruitment variability in exploited marine fish species using a macroecological approach. Location: Global ocean.

Time period: 1950-2018.

Major taxa studied: Bony and cartilaginous fish.

Methods: Based on trait data for 244 marine fish species, we objectively extend the established Equilibrium-Periodic-Opportunistic (E-P-O) life history classification scheme to include two additional emergent life history strategies: “Bet-hedgers” (B) and Salmonic (S) strategists. B strategists include Rockfishes and other species inhabiting patchy benthic habitats with life histories that blend characteristics of E and P species; they combine very long lifespans with elevated investments in both parental care and fecundity. S strategists are comprised of mostly...
H
Data from: Reviving rivers, empowering communities: Assessing the impact of...
dataverse.harvard.edu
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kankal Yash (2024). Reviving rivers, empowering communities: Assessing the impact of rejuvenation efforts in Bodh Gaya villages [Dataset]. http://doi.org/10.7910/DVN/HXFG5C
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/HXFG5C
Dataset updated
Oct 31, 2024
Dataset provided by
Harvard Dataverse
Authors
Kankal Yash
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Gaya, Bodh Gaya
Description
Methodology Data Collection Methods: We followed a mixed-methods approach combining qualitative thematic analysis and quantitative multivariate regression to comprehensively explore the anticipated impacts of river rejuvenation efforts in Bodh Gaya villages. 1. Qualitative Data Collection (Thematic Analysis): Method: Open-ended questions were used to gather rich qualitative insights into participants' perceptions and expectations regarding river rejuvenation. Data Collection: Responses were collected through in-person interviews and discussions with villagers, allowing for nuanced exploration of themes and narratives. Analysis: Thematic analysis was conducted to identify recurring patterns and themes in participants' responses, providing qualitative depth to complement quantitative findings. 2. Quantitative Data Collection (Survey): Method: Structured surveys were utilized, incorporating both objective and Likert scale questions to quantify participants' expectations and perceived impacts. Data Collection: Surveys were administered in offline modes, with researchers orally presenting questions and recording responses directly from participants. Question Types: The survey included nominal questions for categorical data and Likert scale questions to assess varying levels of agreement or expectation. Sampling Procedures: 1. Purposive Sampling (Qualitative): Selection Criteria: Participants were purposefully selected from Silaunja and Gangaar villages based on their proximity to the Niranjana River and their involvement in local community affairs. 2. Simple Random Sampling (Quantitative): Rationale: Individuals from selected villages were chosen through simple random sampling, ensuring unbiased representation and generalizability of findings to the broader population.
f
Data from: Aging and vulnerability: an analysis of 1,062 elderly persons
scielo.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubia Rosalinn da Cruz; Vilma Beltrame; Fabiana Meneghetti Dallacosta (2023). Aging and vulnerability: an analysis of 1,062 elderly persons [Dataset]. http://doi.org/10.6084/m9.figshare.9927161.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9927161.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELO journals
Authors
Rubia Rosalinn da Cruz; Vilma Beltrame; Fabiana Meneghetti Dallacosta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Objective: To analyze the vulnerability of non-institutionalized elderly persons. Method: A cross-sectional, descriptive and analytical study was carried out using data of the City Health Department of Palmas, Paraná, Brazil, and the Vulnerable Elders Survey (VES-13) instrument. The questionnaires of people aged over 60 years who had answered the VES-13 questionnaire between January 2016 and December 2017 were included. The quantitative data were analyzed by the Student’s T-Test and the categorical data by the Chi-square and Fisher’s Exact Test. The correlation between the quantitative variables was performed by the Pearson correlation coefficient. Results: A total of 1,062 questionnaires were analyzed, of which 57.3% were female, with a mean age 69 (±7.8) years. In total 427 individuals (40.2%) were vulnerable and 635 (59.8%) were not vulnerable according to VES-13 score. A total of 635 (59.8%) elderly persons were classified as robust, 176 (16.6%) as at risk of frailty and 251 (23.6%) as frail. Women and those over 75 years were more vulnerable (p
a
External Evaluation of the In Their Hands Programme (Kenya)., Round 1 -...
microdataportal.aphrc.org
Updated Oct 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
African Population and Health Research Centre (2021). External Evaluation of the In Their Hands Programme (Kenya)., Round 1 - Kenya [Dataset]. https://microdataportal.aphrc.org/index.php/catalog/117
Explore at:
Dataset updated
Oct 19, 2021
Dataset authored and provided by
African Population and Health Research Centre
Time period covered
2018
Area covered
Kenya
Description
Abstract

Background: Adolescent girls in Kenya are disproportionately affected by early and unintended pregnancies, unsafe abortion and HIV infection. The In Their Hands (ITH) programme in Kenya aims to increase adolescents' use of high-quality sexual and reproductive health (SRH) services through targeted interventions. ITH Programme aims to promote use of contraception and testing for sexually transmitted infections (STIs) including HIV or pregnancy, for sexually active adolescent girls, 2) provide information, products and services on the adolescent girl's terms; and 3) promote communities support for girls and boys to access SRH services.

Objectives: The objectives of the evaluation are to assess: a) to what extent and how the new Adolescent Reproductive Health (ARH) partnership model and integrated system of delivery is working to meet its intended objectives and the needs of adolescents; b) adolescent user experiences across key quality dimensions and outcomes; c) how ITH programme has influenced adolescent voice, decision-making autonomy, power dynamics and provider accountability; d) how community support for adolescent reproductive and sexual health initiatives has changed as a result of this programme.

Methodology ITH programme is being implemented in two phases, a formative planning and experimentation in the first year from April 2017 to March 2018, and a national roll out and implementation from April 2018 to March 2020. This second phase is informed by an Annual Programme Review and thorough benchmarking and assessment which informed critical changes to performance and capacity so that ITH is fit for scale. It is expected that ITH will cover approximately 250,000 adolescent girls aged 15-19 in Kenya by April 2020. The programme is implemented by a consortium of Marie Stopes Kenya (MSK), Well Told Story, and Triggerise. ITH's key implementation strategies seek to increase adolescent motivation for service use, create a user-defined ecosystem and platform to provide girls with a network of accessible subsidized and discreet SRH services; and launch and sustain a national discourse campaign around adolescent sexuality and rights. The 3-year study will employ a mixed-methods approach with multiple data sources including secondary data, and qualitative and quantitative primary data with various stakeholders to explore their perceptions and attitudes towards adolescents SRH services. Quantitative data analysis will be done using STATA to provide descriptive statistics and statistical associations / correlations on key variables. All qualitative data will be analyzed using NVIVO software.

Study Duration: 36 months - between 2018 and 2020.

Geographic coverage

Narok and Homabay counties

Analysis unit

Households

Universe

All adolescent girls aged 15-19 years resident in the household.

Sampling procedure

The sampling of adolescents for the household survey was based on expected changes in adolescent's intention to use contraception in future. According to the Kenya Demographic and Health Survey 2014, 23.8% of adolescents and young women reported not intending to use contraception in future. This was used as a baseline proportion for the intervention as it aimed to increase demand and reduce the proportion of sexually active adolescents who did not intend to use contraception in the future. Assuming that the project was to achieve an impact of at least 2.4 percentage points in the intervention counties (i.e. a reduction by 10%), a design effect of 1.5 and a non- response rate of 10%, a sample size of 1885 was estimated using Cochran's sample size formula for categorical data was adequate to detect this difference between baseline and end line time points. Based on data from the 2009 Kenya census, there were approximately 0.46 adolescents girls per a household, which meant that the study was to include approximately 4876 households from the two counties at both baseline and end line surveys.

We collected data among a representative sample of adolescent girls living in both urban and rural ITH areas to understand adolescents' access to information, use of SRH services and SRH-related decision making autonomy before the implementation of the intervention. Depending on the number of ITH health facilities in the two study counties, Homa Bay and Narok that, we sampled 3 sub-Counties in Homa Bay: West Kasipul, Ndhiwa and Kasipul; and 3 sub-Counties in Narok, Narok Town, Narok South and Narok East purposively. In each of the ITH intervention counties, there were sub-counties that had been prioritized for the project and our data collection focused on these sub-counties selected for intervention. A stratified sampling procedure was used to select wards with in the sub-counties and villages from the wards. Then households were selected from each village after all households in the villages were listed. The purposive selection of sub-counties closer to ITH intervention facilities meant that urban and semi-urban areas were oversampled due to the concentration of health facilities in urban areas.

Qualitative Sampling

Focus Group Discussion participants were recruited from the villages where the ITH adolescent household survey was conducted in both counties. A convenience sample of consenting adults living in the villages were invited to participate in the FGDS. The discussion was conducted in local languages. A facilitator and note-taker trained on how to use the focus group guide, how to facilitate the group to elicit the information sought, and how to take detailed notes. All focus group discussions took place in the local language and were tape-recorded, and the consent process included permission to tape-record the session. Participants were identified only by their first names and participants were asked not to share what was discussed outside of the focus group. Participants were read an informed consent form and asked to give written consent. In-depth interviews were conducted with purposively selected sample of consenting adolescent girls who participated in the adolescent survey. We conducted a total of 45 In-depth interviews with adolescent girls (20 in Homa Bay County and 25 in Narok County respectively). In addition, 8 FGDs (4 each per county) were conducted with mothers of adolescent girls who are usual residents of the villages which had been identified for the interviews and another 4 FGDs (2 each per county) with CHVs.

Sampling deviation

N/A

Mode of data collection

Face-to-face [f2f] for quantitative data collection and Focus Group Discussions and In Depth Interviews for qualitative data collection

Research instrument

The questionnaire covered; socio-demographic and household information, SRH knowledge and sources of information, sexual activity and relationships, family planning knowledge, access, choice and use when needed, exposure to family planning messages and voice and decision making autonomy and quality of care for those who visited health facilities in the 12 months before the survey. The questionnaire was piloted before the data collection and the questions reviewed for appropriateness, comprehension and flow. The questionnaire was piloted among a sample of 42 adolescent girls (two each per field interviewer) 15-19 from a community outside the study counties.

The questionnaire was originally developed in English and later translated into Kiswahili. The questionnaire was programmed using ODK-based Survey CTO platform for data collection and management and was administered through face-to-face interview.

Cleaning operations

The survey tools were programmed using the ODK-based SurveyCTO platform for data collection and management. During programming, consistency checks were in-built into the data capture software which ensured that there were no cases of missing or implausible information/values entered into the database by the field interviewers. For example, the application included controls for variables ranges, skip patterns, duplicated individuals, and intra- and inter-module consistency checks. This reduced or eliminated errors usually introduced at the data capture stage. Once programmed, the survey tools were tested by the programming team who in conjunction with the project team conducted further testing on the application's usability, in-built consistency checks (skips, variable ranges, duplicating individuals etc.), and inter-module consistency checks. Any issues raised were documented and tracked on the Issue Tracker and followed up to full and timely resolution. After internal testing was done, the tools were availed to the project and field teams to perform user acceptance testing (UAT) so as to verify and validate that the electronic platform worked exactly as expected, in terms of usability, questions design, checks and skips etc.

Data cleaning was performed to ensure that data were free of errors and that indicators generated from these data were accurate and consistent. This process begun on the first day of data collection as the first records were uploaded into the database. The data manager used data collected during pilot testing to begin writing scripts in Stata 14 to check the variables in the data in 'real-time'. This ensured the resolutions of any inconsistencies that could be addressed by the data collection teams during the fieldwork activities. The Stata 14 scripts that perform real-time checks and clean data also wrote to a .rtf file that detailed every check performed against each variable, any inconsistencies encountered, and all steps that were taken to address these inconsistencies. The .rtf files also reported when a variable was
n
A survey of sensor network use and data management among academic ecologists...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Aug 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine M Laney; Deana D Pennington; Craig E Tweedie (2015). A survey of sensor network use and data management among academic ecologists [Dataset]. http://doi.org/10.15146/R36P4T
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15146/R36P4T
Dataset updated
Aug 16, 2015
Authors
Christine M Laney; Deana D Pennington; Craig E Tweedie
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Automated sensors are ubiquitous in ecological research networks and academic-led research – the ‘long-tail’ of ecological research. We conducted a survey of academic ecologists to assess the extent of sensor use and how data are managed. Respondents were from 135 groups representing>1,800 researchers from 92 US universities; these collectively match the expenditure, sensor use, and data volumes of several large national research networks. Few reported use of metadata and workflows and almost 70% archive data locally and not in institutional archives, though most recognized the importance of doing so. Most indicated that better access to tools and cyber expertise would enhance their research. Improving access to such datasets may include improved software tools and access to expert knowledge, targeted training, high-profile studies that showcase the participation of academic researchers in large scale syntheses, and incentives for industry to develop, adopt, or adapt technologies that improve data documentation, discovery, and sharing. Methods The survey solicited responses between August 2012 and July 2013 from a diverse pool of >3,800 ecologists within the US-based academic ecological research community identified via the LTER, Organization of Biological Field Sites (OBFS), and university websites. This mixed methods online survey (hosted at SurveyMonkey®; http://www.surveymonkey.com/s/ecodata) was composed of 42 quantitative, categorical, and open-ended questions (included in this data package). The survey was reviewed by the Institutional Review Board (IRB) within the Office of Research and Sponsored Projects (ORSP) at The University of Texas at El Paso and classified as exempt from IRB review. Individual identifying information was not required and if present, was not included in analyses. The questions addressed the following topics: 1. Research group composition, study area locations, funding, and types of 105 research conducted. 2. Data collection methods, system infrastructure, replacement costs, and ideal setup (i.e., the setup that they would like to have if resources were not limited). 3. Affiliations with research networks and perceptions of benefits or disadvantages of such affiliations. 4. Research groups’ methods of managing data and making data available to other researchers, including data and metadata formats, data archive locations, and the use of controlled vocabularies and scientific workflows. 5. Publication record, including time from data collection to publication and journal names. Responses were downloaded on 28 June 2013 and filtered to omit highly incomplete submissions. The remainder were checked for errors and inconsistencies and summarized. Because the survey focused on a specific research community and was not paired with a follow up survey, data were summarized without any further quantitative analysis. The number of responses for each question varied because respondents could skip questions or end the survey at any time.
m
Data from: Las Vegas Strip
data.mendeley.com
Updated Jul 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sérgio Moro (2017). Las Vegas Strip [Dataset]. http://doi.org/10.17632/tsf9sjdwh2.1
Explore at:
Unique identifier
https://doi.org/10.17632/tsf9sjdwh2.1
Dataset updated
Jul 29, 2017
Authors
Sérgio Moro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Las Vegas Strip, Las Vegas
Description
This dataset includes quantitative and categorical features from online reviews from 21 hotels located in Las Vegas Strip, extracted from TripAdvisor (http://www.tripadvisor.com). All the 504 reviews were collected between January and August of 2016. The dataset contains 504 records and 20 tuned features (as of “status = included”, from Table 1 of the article mentioned below), 24 per hotel (two per each month, randomly selected), regarding the year of 2015.
u
Data from: Lending Club loan dataset for granting models
produccioncientifica.ucm.es
portalcientifico.uah.es
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ariza-Garzón, Miller Janny; Sanz-Guerrero, Mario; Arroyo Gallardo, Javier; Lending Club; Ariza-Garzón, Miller Janny; Sanz-Guerrero, Mario; Arroyo Gallardo, Javier; Lending Club (2024). Lending Club loan dataset for granting models [Dataset]. https://produccioncientifica.ucm.es/documentos/668fc499b9e7c03b01be2366?lang=ca
Explore at:
Dataset updated
2024
Authors
Ariza-Garzón, Miller Janny; Sanz-Guerrero, Mario; Arroyo Gallardo, Javier; Lending Club; Ariza-Garzón, Miller Janny; Sanz-Guerrero, Mario; Arroyo Gallardo, Javier; Lending Club
Description
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).

TARGET VARIABLE

The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.

EXPLANATORY VARIABLES

The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.

FULL LIST OF VARIABLES

Loan identification variables:

id: Loan id (unique identifier).

issue_d: Month and year in which the loan was approved.

Quantitative variables:

revenue: Borrower's self-declared annual income during registration.

dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.

loan_amnt: Amount of credit requested by the borrower.

fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.

experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.

Categorical variables:

emp_length: Categorical variable with the employment length of the borrower (includes the no information category)

purpose: Credit purpose category for the loan request.

home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.

addr_state: Borrower's residence state from the USA.

zip_code: Zip code of the borrower's residence.

Textual variables

title: Title of the credit request description provided by the borrower.

desc: Description of the credit request provided by the borrower.

We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).

RELATED WORKS

This dataset has been used in the following academic articles:

Sanz-Guerrero, M. Arroyo, J. (2024). Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending. arXiv preprint arXiv:2401.16458. https://doi.org/10.48550/arXiv.2401.16458

Ariza-Garzón, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access 8, 64873 - 64890. https://doi.org/10.1109/ACCESS.2020.2984412
d
Previous mineral-resource assessment data compilation - shapefiles
datasets.ai
data.usgs.gov
+5more
55
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2023). Previous mineral-resource assessment data compilation - shapefiles [Dataset]. https://datasets.ai/datasets/previous-mineral-resource-assessment-data-compilation-shapefiles
Explore at:
55Available download formats
Dataset updated
May 31, 2023
Dataset authored and provided by
Department of the Interior
Description
The zip file contains shapefiles showing areas of mineral potential for various commodities. The datasets were compiled from previous mineral resource potential reports which covered the SaMiRA project areas. The shapefiles were compiled from datasets which had different data structure schemes and which used two different types of assessment methodology. The BLM used qualitative categorical and others used the USGS quantitative 3-part form of assessment. The original GIS data was re-formatted so that all of the shapefiles had one of two consistent attribute table structures, one for reports that had quantitative data, and one for reports with qualitative data. A general attribute table structure was created which contained fields for information on the deposit type assessed, assessment rank, type of assessment, and tract name and identifier. For the attribute table of the quantitatively assessed reports which used the USGS 3-part form of assessment, we added additional fields for the deposit model name and number, probabilistic assessment results data, and estimators. We captured the original information as presented but also standardized nomenclature when we could and referred to the report text in some instances in order to fill in missing data into the descriptive data tables.
d
Data from: A quantitative taxonomic review of Fusichonetes and...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Apr 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hui-ting Wu; Guang R. Shi; Wei-hong He (2025). A quantitative taxonomic review of Fusichonetes and Tethyochonetes (Chonetidina, Brachiopoda) [Dataset]. http://doi.org/10.5061/dryad.vb051
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.vb051
Dataset updated
Apr 6, 2025
Dataset provided by
Dryad Digital Repository
Authors
Hui-ting Wu; Guang R. Shi; Wei-hong He
Time period covered
Jan 1, 2017
Description
Two middle Permian (Capitanian) to Early Triassic (Griesbachian) rugosochonetidae brachiopod genera, Fusichonetes Liao in Zhao et al., 1981 and Tethyochonetes Chen et al., 2000, have been regarded as two distinct taxa and used as such for a wide range of discussions including biostratigraphy, paleoecology, paleobiogeography, and the Permian-Triassic boundary mass extinction. However, the supposed morphological distinctions between the two taxa are subtle at best and appear to represent two end members of a continuum of morphological variations. In this study, we applied a range of quantitative and analytical procedures (bivariate plots, Kolmogorov-Smirnov test, categorical principle component analysis, and cladistic analysis) to a dataset of 15 quantified morphological variables, integrating both key external and internal characters, measured from 141 specimens of all well-known Fusichonetes and Tethyochonetes in order to test whether or not these two genera could be distinguished in vi...
f
Data from: On the Agreement between Manual and Automated Methods for...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 10, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mørch, Carsten D.; Andersen, Ole K.; Manresa, José A. Biurrun; Redondo, David E. Medina; Arguissain, Federico G. (2015). On the Agreement between Manual and Automated Methods for Single-Trial Detection and Estimation of Features from Event-Related Potentials [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001873104
Explore at:
Dataset updated
Aug 10, 2015
Authors
Mørch, Carsten D.; Andersen, Ole K.; Manresa, José A. Biurrun; Redondo, David E. Medina; Arguissain, Federico G.
Description
The agreement between humans and algorithms on whether an event-related potential (ERP) is present or not and the level of variation in the estimated values of its relevant features are largely unknown. Thus, the aim of this study was to determine the categorical and quantitative agreement between manual and automated methods for single-trial detection and estimation of ERP features. To this end, ERPs were elicited in sixteen healthy volunteers using electrical stimulation at graded intensities below and above the nociceptive withdrawal reflex threshold. Presence/absence of an ERP peak (categorical outcome) and its amplitude and latency (quantitative outcome) in each single-trial were evaluated independently by two human observers and two automated algorithms taken from existing literature. Categorical agreement was assessed using percentage positive and negative agreement and Cohen’s κ, whereas quantitative agreement was evaluated using Bland-Altman analysis and the coefficient of variation. Typical values for the categorical agreement between manual and automated methods were derived, as well as reference values for the average and maximum differences that can be expected if one method is used instead of the others. Results showed that the human observers presented the highest categorical and quantitative agreement, and there were significantly large differences between detection and estimation of quantitative features among methods. In conclusion, substantial care should be taken in the selection of the detection/estimation approach, since factors like stimulation intensity and expected number of trials with/without response can play a significant role in the outcome of a study.
d
DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic Plays [Dataset]. https://catalog.data.gov/dataset/deepen-global-standardized-categorical-exploration-datasets-for-magmatic-plays-f1ecf
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments. As part of the development of the DEEPEN 3D play fairway analysis (PFA) methodology for magmatic plays (conventional hydrothermal, superhot EGS, and supercritical), weights needed to be developed for use in the weighted sum of the different favorability index models produced from geoscientific exploration datasets. This was done using two different approaches: one based on expert opinions, and one based on statistical learning. This GDR submission includes the datasets used to produce the statistical learning-based weights. While expert opinions allow us to include more nuanced information in the weights, expert opinions are subject to human bias. Data-centric or statistical approaches help to overcome these potential human biases by focusing on and drawing conclusions from the data alone. The drawback is that, to apply these types of approaches, a dataset is needed. Therefore, we attempted to build comprehensive standardized datasets mapping anomalies in each exploration dataset to each component of each play. This data was gathered through a literature review focused on magmatic hydrothermal plays along with well-characterized areas where superhot or supercritical conditions are thought to exist. Datasets were assembled for all three play types, but the hydrothermal dataset is the least complete due to its relatively low priority. For each known or assumed resource, the dataset states what anomaly in each exploration dataset is associated with each component of the system. The data is only a semi-quantitative, where values are either high, medium, or low, relative to background levels. In addition, the dataset has significant gaps, as not every possible exploration dataset has been collected and analyzed at every known or suspected geothermal resource area, in the context of all possible play types. The following training sites were used to assemble this dataset: - Conventional magmatic hydrothermal: Akutan (from AK PFA), Oregon Cascades PFA, Glass Buttes OR, Mauna Kea (from HI PFA), Lanai (from HI PFA), Mt St Helens Shear Zone (from WA PFA), Wind River Valley (From WA PFA), Mount Baker (from WA PFA). - Superhot EGS: Newberry (EGS demonstration project), Coso (EGS demonstration project), Geysers (EGS demonstration project), Eastern Snake River Plain (EGS demonstration project), Utah FORGE, Larderello, Kakkonda, Taupo Volcanic Zone, Acoculco, Krafla. - Supercritical: Coso, Geysers, Salton Sea, Larderello, Los Humeros, Taupo Volcanic Zone, Krafla, Reyjanes, Hengill. **Disclaimer: Treat the supercritical fluid anomalies with skepticism. They are based on assumptions due to the general lack of confirmed supercritical fluid encounters and samples at the sites included in this dataset, at the time of assembling the dataset. The main assumption was that the supercritical fluid in a given geothermal system has shared properties with the hydrothermal fluid, which may not be the case in reality. Once the datasets were assembled, principal component analysis (PCA) was applied to each. PCA is an unsupervised statistical learning technique, meaning that labels are not required on the data, that summarized the directions of variance in the data. This approach was chosen because our labels are not certain, i.e., we do not know with 100% confidence that superhot resources exist at all the assumed positive areas. We also do not have data for any known non-geothermal areas, meaning that it would be challenging to apply a supervised learning technique. In order to generate weights from the PCA, an analysis of the PCA loading values was conducted. PCA loading values represent how much a feature is contributing to each principal component, and therefore the overall variance in the data.
German Credit Data
kaggle.com
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehtap Çetintaş (2022). German Credit Data [Dataset]. https://www.kaggle.com/datasets/mehtapcetintas/germancredit/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mehtap Çetintaş
Description
Description of the German credit dataset.

Title: German Credit data

Source Information

Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13

Number of Instances: 1000

Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file "german.data".

For algorithms that need numerical attributes, Strathclyde University produced the file "german.data-numeric". This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.

Number of Attributes german: 20 (7 numerical, 13 categorical) Number of Attributes german.numer: 24 (24 numerical)

Attribute description for german

Attribute 1: (qualitative) Status of existing checking account A11 : ... < 0 DM A12 : 0 <= ... < 200 DM A13 : ... >= 200 DM / salary assignments for at least 1 year A14 : no checking account

Attribute 2: (numerical) Duration in month

Attribute 3: (qualitative) Credit history A30 : no credits taken/ all credits paid back duly A31 : all credits at this bank paid back duly A32 : existing credits paid back duly till now A33 : delay in paying off in the past A34 : critical account/ other credits existing (not at this bank)

Attribute 4: (qualitative) Purpose A40 : car (new) A41 : car (used) A42 : furniture/equipment A43 : radio/television A44 : domestic appliances A45 : repairs A46 : education A47 : (vacation - does not exist?) A48 : retraining A49 : business A410 : others

Attribute 5: (numerical) Credit amount

Attibute 6: (qualitative) Savings account/bonds A61 : ... < 100 DM A62 : 100 <= ... < 500 DM A63 : 500 <= ... < 1000 DM A64 : .. >= 1000 DM A65 : unknown/ no savings account

Attribute 7: (qualitative) Present employment since A71 : unemployed A72 : ... < 1 year A73 : 1 <= ... < 4 years
A74 : 4 <= ... < 7 years A75 : .. >= 7 years

Attribute 8: (numerical) Installment rate in percentage of disposable income

Attribute 9: (qualitative) Personal status and sex A91 : male : divorced/separated A92 : female : divorced/separated/married A93 : male : single A94 : male : married/widowed A95 : female : single

Attribute 10: (qualitative) Other debtors / guarantors A101 : none A102 : co-applicant A103 : guarantor

Attribute 11: (numerical) Present residence since

Attribute 12: (qualitative) Property A121 : real estate A122 : if not A121 : building society savings agreement/ life insurance A123 : if not A121/A122 : car or other, not in attribute 6 A124 : unknown / no property

Attribute 13: (numerical) Age in years

Attribute 14: (qualitative) Other installment plans A141 : bank A142 : stores A143 : none

Attribute 15: (qualitative) Housing A151 : rent A152 : own A153 : for free

Attribute 16: (numerical) Number of existing credits at this bank

Attribute 17: (qualitative) Job A171 : unemployed/ unskilled - non-resident A172 : unskilled - resident A173 : skilled employee / official A174 : management/ self-employed/ highly qualified employee/ officer

Attribute 18: (numerical) Number of people being liable to provide maintenance for

Attribute 19: (qualitative) Telephone A191 : none A192 : yes, registered under the customers name

Attribute 20: (qualitative) foreign worker A201 : yes A202 : no

Cost Matrix

This dataset requires use of a cost matrix (see below)

1 2

1 0 1

2 5 0

(1 = Good, 2 = Bad)

the rows represent the actual classification and the columns the predicted classification.

It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).
e
Implications of the National Offender Management Service for Prison...
b2find.eudat.eu
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Implications of the National Offender Management Service for Prison Officers, 2006 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/3b0d7e91-09ae-5f42-8795-f1b73601ed17
Explore at:
Dataset updated
Oct 23, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. The National Offender Management Service (NOMS) was introduced in June 2004. It is intended to integrate the prison and probation services and to provide an operational framework for the 'end-to-end management' of offenders throughout custodial and community elements of their sentences. It also introduces a 'purchaser-provider split' in the delivery of correctional services. The main objective of this mixed methodology study was to explore the perspectives and experiences of frontline prison staff regarding the transition to NOMS. Semi-structured interviews with prison officers and governing staff were carried out in 23 prisons, and demographic and other quantitative data collected. As well as documenting this key development in the history of the prison service and its perceived impact on practice, the research was focused on issues of interest to senior managers and those responsible for implementing change. Main Topics: This mixed methodology dataset comprises qualitative semi-structured interviews with 64 prison officers and 23 prison governors, drawn from 23 prisons (spread over seven Prison Service Areas), and one quantitative data file. The quantitative data comprise (non-identifying) descriptive information about participants and categorical answers to interview questions. The qualitative interview transcripts cover five main areas: personal and professional identity; communication regarding the introduction of NOMS; knowledge of NOMS; perceived implications of NOMS; local and personal experiences of prison policy. The quantitative data and qualitative interview transcript data can be linked by ID number. The data have been anonymised to disguise the prisons and Prison Service Areas respondents work in. Volunteer sample Face-to-face interview Compilation or synthesis of existing material quantitative data transcribed from information collected during the interview.
e
Understanding Society Teaching Datasets: Waves 1-3, 2009-2012 - Dataset -...
b2find.eudat.eu
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Understanding Society Teaching Datasets: Waves 1-3, 2009-2012 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/46b42e55-e61f-545b-b995-0c95661e2bd6
Explore at:
Dataset updated
Feb 16, 2024
Description
Abstract copyright UK Data Service and data collection copyright owner. The aim of this project was to make easier-to-handle teaching datasets from Understanding Society, building a core of common variables from the first three waves but including some other wave-specific information from each wave. The richness of the data, including the combination of ratio/interval data as well as categorical/ordinal data, makes for an effective dataset from which to teach quantitative methods of all kinds. A distinctive feature is the provision of datasets in native R format in addition to those in a statistical format. The documentation for the 'parent' Understanding Society dataset (SN 6614 at the UK Data Service) is the definitive guide to the variables, sampling, etc. The dataset was compiled under the ESRC-funded Understanding Society Through Secondary Data Analysis: Quantitative Methods over the Undergraduate Life Course award. This project aimed to enhance the capacity of social science undergraduates to understand and use numeric data in their studies, by generating teaching-ready datasets based on Understanding Society. It also aimed to create associated digital resources that will be available for other students to use; to increase the amount of work using secondary data at the undergraduate level and to underline the relevance of research methods to the study of social sciences. As part of the award, a programme of new courses was planned, taking undergraduate students from their first year through to their final year. This new approach aimed to provide a solid foundation for employability and future careers using quantitative skills (whether in the public sector, in academia, or elsewhere). Main Topics:
g
Morphology phenotype data of 88 types of galls induced on oak trees by...
gimi9.com
Updated Jun 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Morphology phenotype data of 88 types of galls induced on oak trees by cynipid gallwasps from 6 sites in Hungary, 2000-2003 | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_morphology-phenotype-data-of-88-types-of-galls-induced-on-oak-trees-by-cynipid-gallwa-2000-2003/
Explore at:
Dataset updated
Jun 2, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data consist of quantitative and categorical scores for phenotypic attributes of 88 types of galls induced on oak trees (Quercus spp.) by cynipid gallwasps (Hymenoptera: Cynipidae: Cynipini). The recorded variables focus on attributes such as hardness, presence of surface spines or coatings of sticky resin, each of which are thought to contribute to protection of the gall inhabitants form attack by natural enemies such as parasitoid wasps and birds. Cynipid galls have separate sexual and asexual generation galls, each with different phenotypes. The dataset comprises values for 31 sexual generation galls and 58 asexual generation galls of a total of 69 cynipid species. The biological rationale for regarding these phenotypic traits as defences is explained in Bailey et al (2009). The purpose of these data is to include them as explanatory variables in statistical analyses that seek to quantify the effects of gall traits on the composition and abundance of parasitoid natural enemies in cynipid gall communities. Full details about this dataset can be found at https://doi.org/10.5285/bc10f720-2bb6-4ff4-ad63-257663fd41a3

Facebook

Twitter

Click to copy link

Link copied

Cite

Eliza Jaeger (2024). Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen [Dataset]. http://doi.org/10.5061/dryad.mpg4f4r89

Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.mpg4f4r89

Dataset updated

Oct 10, 2024

Dataset provided by

Columbia University

Authors

Eliza Jaeger

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

In this study, we analyzed the efficiency of different adeno-associated viral (AAV) injections in transfecting neurons in the salamander Pleurodeles waltl. To query sources of variation in AAV injection outcomes, we analyzed metadata and outcomes for an additional 68 intraparenchymal injections performed 39 in post-metamorphic Pleurodeles salamanders. This dataset included both quantitative variables (age, weight, viral genomes (v.g.) injected, and a cell transduction score ranging from 0 to 4, see STAR Methods) and categorical variables (here referred to as qualitative variables: serotype, promoter, reporter, single vs. dual injection, manufacturer, and injection site). To assess the associations between these variables, we performed a Factor Analysis for Mixed Data (FAMD), a principal component method that is designed to determine significant sources of variability within datasets that contain both quantitative and qualitative data types. This dataset contains the injection outcomes and metadata for AAV injections administered in the salamander Pleurodeles waltl. The RactoMineR package (https://CRAN.R-project.org/package=FactoMineR) was used to determine significant contributions of a number of variables contributing to injection outcomes. Analysis of these data revealed that, among other factors, age co-varies with injection score. Therefore, we conclude that increased animal age decreases the efficacy of this tool.

Clear search

Close search

Google apps

Main menu

Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD...

Differential Privacy Challenge - Sprint 3

Context

Content

Acknowledgements

Inspiration

Data from: States and International Criminal Justice: COST CA18228 Scoping...

Data from: Prevalence and individual level enablers and barriers for...

Modeled Estimates of Altered Hydrologic Metrics for All NHDPlus v21 Reaches...

Data from: An updated life history scheme for marine fishes predicts...

Data from: Reviving rivers, empowering communities: Assessing the impact of...

Data from: Aging and vulnerability: an analysis of 1,062 elderly persons

External Evaluation of the In Their Hands Programme (Kenya)., Round 1 -...

Abstract

Geographic coverage

Analysis unit

Universe

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

A survey of sensor network use and data management among academic ecologists...

Data from: Las Vegas Strip

Data from: Lending Club loan dataset for granting models

Previous mineral-resource assessment data compilation - shapefiles

Data from: A quantitative taxonomic review of Fusichonetes and...

Data from: On the Agreement between Manual and Automated Methods for...

DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...

German Credit Data

1 0 1

Implications of the National Offender Management Service for Prison...

Understanding Society Teaching Datasets: Waves 1-3, 2009-2012 - Dataset -...

Morphology phenotype data of 88 types of galls induced on oak trees by...

Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen