The GLobal Ocean Data Analysis Project (GLODAP) is a cooperative effort to coordinate global synthesis projects funded through NOAA/DOE and NSF as part of the Joint Global Ocean Flux Study - Synthesis and Modeling Project (JGOFS-SMP). Cruises conducted as part of the World Ocean Circulation Experiment (WOCE), Joint Global Ocean Flux Study (JGOFS) and NOAA Ocean-Atmosphere Exchange Study (OACES) over the decade of the 1990s have created an oceanographic database of unparalleled quality and quantity. These data provide an important asset to the scientific community investigating carbon cycling in the oceans.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Gate household income by age. The dataset can be utilized to understand the age-based income distribution of Gate income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Gate income distribution by age. You can refer the same here
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
This statistic shows the results of a survey on data-driven projects, either planned or implemented, among technology magazine readers. In 2015, 27 percent of respondents indicated that their companies had already deployed or implemented a data-driven project. Fewer than a third of respondents said their companies had no plans to deploy or implement a data-driven project.
We have applied 3D shape-based retrieval to various disciplines such as computer vision, CAD/CAM, computer graphics, molecular biology and 3D anthropometry. We have organized two workshops on 3D shape retrieval and two shape retrieval contests. We also have developed 3D shape benchmarks, performance evaluation software and prototype 3D retrieval systems. We have developed a robotic map quality assessment tool in collaboration with MEL) We also have developed different shape descriptors to represent 3D human bodies and heads efficiently and other work related to 3D anthropometry. Finally, we also have done some in a Structural Bioinformatics, Bio-Image analysis and retrieval.
DESCRIPTION
Create a model that predicts whether or not a loan will be default using the historical data.
Problem Statement:
For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.
Domain: Finance
Analysis to be done: Perform data preprocessing and build a deep learning prediction model.
Content:
Dataset columns and definition:
credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").
int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.
installment: The monthly installments owed by the borrower if the loan is funded.
log.annual.inc: The natural log of the self-reported annual income of the borrower.
dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).
fico: The FICO credit score of the borrower.
days.with.cr.line: The number of days the borrower has had a credit line.
revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.
delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).
Steps to perform:
Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning model to predict whether or not the loan will be default using the historical data.
Tasks:
Transform categorical values into numerical values (discrete)
Exploratory data analysis of different factors of the dataset.
Additional Feature Engineering
You will check the correlation between features and will drop those features which have a strong correlation
This will help reduce the number of features and will leave you with the most relevant features
After applying EDA and feature engineering, you are now ready to build the predictive models
In this part, you will create a deep learning model using Keras with Tensorflow backend
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hypothesis: The reliability can be adopted to quantitatively measure the sustainability of mega-projects.
Presentation: This dataset shows two scenario based examples to establish an initial reliability assessment of megaproject sustainability. Data were gathered from the author’s assumption with regard to assumed differences between scenarios A and B. There are two sheets in this Microsoft Excel file, including a comparison between two scenarios by using a Fault Tree Analysis model, and a correlation analysis between reliability and unavailability.
Notable findings: It has been found from this exploratory experiment that the reliability can be used to quantitatively measure megaproject sustainability, and there is a negative correlation between reliability and unavailability among 11 related events in association with sustainability goals in the life-cycle of megaproject.
Interpretation: Results from data analysis by using the two sheets can be useful to inform decision making on megaproject sustainability. For example, the reliability to achieve sustainability goals can be enhanced by decrease the unavailability or the failure at individual work stages in megaproject delivery.
Implication: This dataset file can be used to perform reliability analysis in other experiment to access megaproject sustainability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Maddison Project Dataset 2020 Population by Region’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/maddison-project-dataset-2020-population-by-region on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The Maddison Project Database provides information on comparative economic growth and income levels over the very long run. The 2020 version of this database covers 169 countries and the period up to 2018. For questions not covered in the documentation, please contact maddison@rug.nl.
We now offer a new 2020 update of the Maddison Project database, which uses a different methodology compared to the 2018 update. The approach of the 2018 update is identical to that of Penn World Tables, and consistent with recent economic and statistical research in this field. However, applying this approach systematically results in historical outcomes that are not consistent with current insights by economic historians, as explained in Bolt and Van Zanden (2020).
The 2020 update has to some extent gone back to the original Maddison approach to remedy for this (see documentation). Both the 2018 and the 2020 datasets incorporate the available recent work by economic historians on long term economic growth, the 2020 is most complete in this respect.
Attribution requirement -
All original papers must be cited when:
the data is shown in any graphical form subsets of the full dataset that include less than a dozen (12) countries are used for statistical analysis or any other purposes
A list of original papers can be found in the source sheet of the database. When neither a) or b) apply, then the MPD as a whole should be cited.
Maddison Project Database, version 2020. Bolt, Jutta and Jan Luiten van Zanden (2020), “Maddison style estimates of the evolution of the world economy. A new 2020 update ”.
You can find some inspiration here : https://ourworldindata.org/global-economic-inequality-introduction
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Marketing Project: App Review Text Analysis Dataset: linkedin-reviews Dataset Type: Excel Data Dataset Size: 1k+ records
KPI's: 1. Distribution of Ratings 2. Distribution of Review Lengths 3. Distribution of Sentiments 4. Sentiment Distribution across Ratings
Process: 1. Understanding the problem 2. Data Collection 3. Perform EDA by analyzing the length of the reviews and their ratings 4. Label the sentiment data using tools like Textblob or NLTK 5. Understanding the overall distribution of sentiments (positive, negative, neutral) in the dataset 6. Explore the relationship between the sentiments and the ratings given 7. Analyze the text of the reviews to identify common themes or words in different sentiment categories. 8. Interpreting the results
This data contains pandas, matplotlib, seaborn, countplot, histplot, textblob.sentiment, sentiment.polarity, value_counts, barplot, hue, .join, wordcloud, imshow, interpolation, figsize, generate_word_cloud(sentiment)
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
European Marine Energy Centre (EMEC) wildlife observation data has been collected from the marine renewable test sites at Billia Croo and Fall of Warness in Orkney. These data have been processed and cleansed and used in reports prepared by EMEC, for SNH and Marine Scotland.
This table contains a new catalog of 5106 infrared bubbles created through visual classification via the online citizen science website 'The Milky Way Project' (MWP). Bubbles in the new catalog have been independently measured by at least five individuals, producing consensus parameters for their positions, radii, thicknesses, eccentricities and position angles. Citizen scientists - volunteers recruited online and taking part in this research - have independently rediscovered the locations of at least 86% of three widely used catalogs of bubbles and H II regions while finding an order of magnitude more objects. 29% of the bubbles in the Milky Way Project catalog lie on the rim of a larger bubble, or have smaller bubbles located within them, opening up the possibility of better statistical studies of triggered star formation. This online resource of the Milky Way Project provides a crowd-sourced map of bubbles and arcs in the Milky Way, and will enable better statistical analysis of Galactic star formation sites. This table is the first data release of the MWP IR Bubble Catalog: the authors anticipate a future release of a second, refined catalog incorporating better data-reduction techniques. This table was created by the HEASARC in March 2013 based on the CDS Catalog J/MNRAS/424/2442 files mwplarge.dat and mwpsmall.dat. This is a service provided by NASA HEASARC .
The purpose of the coalbed methane geostatistical study was to identify correlations between geologic parameters and gas production for wells completed in the Oak Grove field in Alabama. The study area consisted of 23 wells in the primary degasification grid and 15 wells adjacent to the grid. All data obtained from reports from lineament maps were screened for consistency and accuracy prior to any analyses being performed. The primary analysis of the data consisted of multivariate statistical procedures. The intent of the analysis was to provide information on what effects various geologic and lineament variables had on gas production. It was also intended that the variables having the most important effects on gas production be identified and used for predicting production categories. Principal components analysis was used to establish the well groupings. Results indicated that wells could be grouped based upon two primary criteria: (1) the well's overall production and (2) a comparison of the early production of a well relative to the production realized later in its life. In effect, four unique groupings could be formed based upon the production profiles of the wells. Results of stepwise discriminant analysis, in which the four production categories were examined, indicated that four variables were considered to be important: well elevation, number of lineament intersections within 250 feet, thickness of the Blue Creek coal seam, and the length of the nearest lineament. A means by which a well could be classified into one of the production categories based upon measurements of the four more important variables was also developed. Results of the classification of the wells inside the degasification grid were encouraging, with 91% of the wells being correctly classified. When the classification methods developed based upon wells inside the primary degasification grid were applied to the 15 wells outside the grid, the results were not as encouraging.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/c6e77e20-1cab-4053-894a-b0edcb1df117 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
Links to GPS RINEX data not previously reported, plus links to station web pages, which include most up-to-date time-series
--- Original source retains full ownership of the source dataset ---
Analysis of the projects proposed by the seven finalists to USDOT's Smart City Challenge, including challenge addressed, proposed project category, and project description. The time reported for the speed profiles are between 2:00PM to 8:00PM in increments of 10 minutes.
Global Ocean Data Analysis Project for Carbon (GLODAP) _NCProperties=version=1|netcdflibversion=4.6.1|hdf5libversion=1.8.12 cdm_data_type=Profile cdm_profile_variables=time,latitude,longitude citation=These data were collected and made freely available by the Copernicus project and the programs that contribute to it. Cite as Olsen, A.; Key, R. M.; van Heuven, S.; Lauvset, S. K.; Velo, A.; Lin, X.; Schirnick, C.; Kozyr, A.; Tanhua, T.; Hoppema, M.; Jutterström, S.; Steinfeldt, R.; Jeansson, E.; Ishii, M.; Pérez, F. F.; and T. Suzuki, T. (2016).The Global Ocean Data Analysis Project version 2 (GLODAPv2) – an internally consistent data product for the world ocean, Earth Syst. Sci. Data, 8, 297-323, https://doi.org/10.5194/essd-8-297-2016 Conventions=CF-1.6, COARDS, ACDD-1.3 data_type=OceanSITES vertical profile Easternmost_Easting=-60.4437 featureType=Profile geospatial_lat_max=44.6973 geospatial_lat_min=32.93 geospatial_lat_units=degrees_north geospatial_lon_max=-60.4437 geospatial_lon_min=-76.36 geospatial_lon_units=degrees_east geospatial_vertical_max=4986.0 geospatial_vertical_min=0.0 geospatial_vertical_positive=down geospatial_vertical_units=m history=Created by Eli Hunter (hunter@marine.rutgers.edu),25-Mar-2020 16:16:27 infoUrl=http://marine.copernicus.eu institution=COPERNICUS MARINE keywords_vocabulary=GCMD Science Keywords Northernmost_Northing=44.6973 references=http://marine.copernicus.eu https://www.glodap.info sourceUrl=Myocean Southernmost_Northing=32.93 standard_name_vocabulary=CF Standard Name Table v55 subsetVariables=time time_coverage_end=2012-03-29T20:12:11Z time_coverage_start=1981-04-02T00:00:00Z Type=GLODAP Observation File: DOPPIO DOMAIN Westernmost_Easting=-76.36
The research was envisaged to last 30 months. One of the consequences of the pandemic was that initial access in early 2020 was challenging and we sought an extension to 36 months. Hence the project began in early 2020 and ran up till the end of 2022. There were two phases to the project. Phase one entailed a Qualitative Comparative Analysis (QAC) to analyse conditions of 10 cases of social innovation in and around MNCs. Phase 2 consisted of semi-structured interviews examine four research question related to social innovation i) interests and motivations of social innovation ii) skills and resources of social innovation iii) inhibiting and enabling institutional factors of social innovation iv) outcomes of social innovation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.
The main dataset is provided as a 17,264 record tab-separated file named enterprise_projects.txt with the following 29 fields.
url: the project's GitHub URL
project_id: the project's GHTorrent identifier
sdtc: true if selected using the same domain top committers heuristic (9,016 records)
mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,314 records)
mcve: true if selected using the multiple committers from a probable company heuristic (8,015 records),
star_number: number of GitHub watchers
commit_count: number of commits
files: number of files in current main branch
lines: corresponding number of lines in text files
pull_requests: number of pull requests
github_repo_creation: timestamp of the GitHub repository creation
earliest_commit: timestamp of the earliest commit
most_recent_commit: date of the most recent commit
committer_count: number of different committers
author_count: number of different authors
dominant_domain: the projects dominant email domain
dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain
dominant_domain_author_commits: corresponding number for commit authors
dominant_domain_committers: number of committers whose email matches the project's dominant domain
dominant_domain_authors: corresponding number for commit authors
cik: SEC's EDGAR "central index key"
fg500: true if this is a Fortune Global 500 company (2,233 records)
sec10k: true if the company files SEC 10-K forms (4,180 records)
sec20f: true if the company files SEC 20-F forms (429 records)
project_name: GitHub project name
owner_login: GitHub project's owner login
company_name: company name as derived from the SEC and Fortune 500 data
owner_company: GitHub project's owner company name
license: SPDX license identifier
The file cohost_project_details.txt provides the full set of 311,223 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.
url: the project's GitHub URL
project_id: the project's GHTorrent identifier
stars: number of GitHub watchers
commit_count: number of commits
The objective of the endline surveys in 2016 were to gather household, biomedical, and cognition data in order to evaluate the long-term impact of home supplementation with micronutrient powders (MNP), when combined with seasonal malaria chemoprevention (SMC) and early stimulation, delivered through community preschools and parenting sessions, on the health and cognitive development of children during the first five years of life.
The trial consisted of 3 arms. First, 60 villages with established Early Childhood Development centres (ECD) were randomised to 1 of 2 arms:
1) Children living in villages in the ECD control arm received SMC as part of national health programming and a national parenting intervention delivered by ECD center staff trained and supported by Save the Children, with ALL resident children eligible to participate in the interventions irrespective of enrolment in ECD program (ECD Control group).
2) Children living in villages in the intervention arm also received the SMC and parenting interventions described above, but additionally were eligible to receive home supplementation with micronutrient powders (MNP intervention arm).
3) Second, a third non-randomised arm was recruited comprised of children living in 30 randomly selected villages where there were no ECD centers in place and thus both the parenting interventions and MNPs were absent. These children received SMC only, as part of national health programming (non-ECD comparison arm).
Trial arm and Interventions received:
T1. MNP intervention arm: 30 villages with ECD centre (randomised); MNP-Yes, Parenting-Yes, SMC-Yes C1. ECD control arm: 30 villages with ECD centre (randomised); MNP-No, Parenting-Yes, SMC-Yes C2. Non-ECD comparison arm: 30 villages without ECD centre (not randomised); MNP-No, Parenting-No, SMC-Yes
Three cross-sectional endline surveys took place during the period May-August 2016, three years after the original MNP intervention began, and consisted of the following questionnaires and assessments in two age groups of children, 3 year olds and 5 year olds:
i) A household questionnaire was used to collect data from the primary adult caregiver of the child on home environment, exposure to the interventions, and reported practice outcomes of relevance to the parenting intervention.
ii) Biomedical outcomes were measured in children through laboratory and clinical assessment.
iii) A battery of tests were used to assess cognitive performance and school readiness in childen, using a different age-specific test battery for each age group adapted for local language and culture.
Note: Household and cognitive performance data were gathered from participants in all three arms. Biomedical data were only collected from children in the two randomised arms, to evaluate impact of MNP supplementation on anaemia (primary biomedical outcome) in children who received MNPs and those who did not, using a robust study design.
Districts (cercles) of Sikasso and Yorosso, Region of Sikasso
Individuals and communities
Random sample of target population for the intervention in the 90 communities that consented to participate in the trial, namely pre-school children 0-6 years.
Sample survey data [ssd]
The target population for the interventions comprised all children aged 3 months to 6 years, who were resident in the 90 study communities participating in the trial; the primary sampling unit is the individual child.
Sample Frame:
To identify the number of target beneficiaries, a complete census of all children of eligible age was carried out in the 90 study villages in August 2013. The census listing from 2013 thus defined the population of children who are eligible to have received the interventions every year for the three years between 2013-2016; and was used as the sampling frame of children in whom the impact after three years of implementation of the interventions was evaluated. The intention was to evaluate study outcomes in the same child one year after the start of the MNP intervention (May 2014) and again after three years of the intervention (2016).
A random sample of children was drawn from all children listed in the census for each community participating in the trial, according to the following age criteria:
Date of Birth, or Age in August 2013 (Age group in 2016 surveys) (i) Born between 1 Jan 2013 – 30 June 2013, or aged <1 year in 2013 census if DOB not known (3 years) (ii) Born between 1 May 2010 – 30 April 2011, or aged 2 years in census if DOB not known (5 years)
Thus, all children previously randomly selected and enrolled in the evaluation cohort in 2014 were, if still resident in the village and present on the day of the survey, re-surveyed in May 2016.
Sample Size:
Power analysis was undertaken for a comparison of two arms, taking account of clustering by community. Survey data on biomedical and cognitive outcomes collected in 2014 were used to inform sample size assumptions, including prevalence of primary outcomes, intraclass correlation (ICC) and number of children recruited per cluster. Prevalence of anaemia amongst 3-year old children in 2014 was found to be 61.6% and 64.0% in the intervention and control arms respectively (p=0.618) and 53.8% and 51.9% respectively amongst 5-year old children (p=0.582). The observed ICC for anaemia endpoint at baseline was 0.08 in 3-year old children and 0.06 in 5-year old children. Observed ICC for cognitive outcomes measured in 2014 was 0.09, ranging from 0.05 to 0.16 for individual tasks within the cognitive battery.
Sample Size Estimation for Health Outcomes:
Approximately 20-25 children per cluster were recruited into each age cohort in 2013. Power calculations for anaemia (primary endpoint) were undertaken for three alternative scenarios at endline: (i) to allow for the possibility of up to 20% loss to follow up between 2014 and 2016, power calculations were performed for a sample size at endline of 16 children per cluster; (ii) a smaller cluster size of 14 children sampled per village, under a scenario of 30% loss to follow-up; and (iii) unequal clusters, to allow for the possibility that variation in losses to follow-up between villages could result in an unequal number of children sampled in each village. In this case, cluster size is the mean number of children sampled per cluster.
Thus, assuming a conservative prevalence of anaemia of 50% in the control group and ICC of 0.08, a sample size of 30 communities per arm with 14-20 children sampled per community, will under all of these scenarios provide 80% power to detect a reduction in anemia of at least 28% at 5% level of significance.
Sample Size Estimation for Cognitive Outcomes:
Power calculations for cognitive outcomes explored: (i) a smaller cluster size of 14 children sampled per village, for example resulting from a higher than expected loss to follow-up of 30%; (ii) statistical analysis of differences between arms which does not adjust for baseline - a scenario which allows for the possibility to increase the sample size to compensate for losses to follow-up by increased recruitment of new children for whom no baseline data would be available; and (iii) effect of unequal clusters. Thus, for cognitive-linguistic skills, a sample size of 30 communities per arm with 14-20 children in each age cohort sampled per community will provide 80% power to detect an effect size between 0.27-0.29 at 5% level of significance, assuming an (ICC) of 0.10 and individual, household and community-level factors account for at least 25% of variation in cognitive foundation skills. Whilst for a similar sample size of 30 communities per arm with 14-20 children sampled per community and ICC of 0.10, a statistical analysis which does not adjust for baseline will provide 80% power to detect an effect size between 0.28-0.30 at 5% level of significance.
The sample at endline in May 2016 thus comprised a total of up to 600 children aged 3y and 600 children aged 5y at endline in each arm: T1 Intervention group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C1 ECD control group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C2 Comparison group (without ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y).
Strategy for Absent Respondents/Not Found/Refusals:
Every effort was made to trace children previously recruited into the evaluation cohort. Since some losses-to-follow-up (for example to due to child deaths, outward migration) were expected between 2014 and 2016, the primary strategy was to oversample in 2014. However, for villages where loss-to-follow-up was higher than expected and it was not possible to trace sufficient number of children remaining from the original sample to meet the required sample size per cluster, additional children were recruited into the evaluation survey in 2016. New recruits were selected at random from the children listed as resident in the village at the time of the original census in 2013. All new recruits had thus been resident in the village and exposed to the interventions throughout the three preceding years.
Face-to-face [f2f]
The questionnaires for the parent interview were structured questionnaires. A questionnaire was administered to the child’s primary caregiver
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project involves conducting and analyzing listening tests using modified webMUSHRA software to evaluate the perceptual accuracy of simulated acoustic environments. The code is structured into five main directories: Docs, containing ethics documents; Modified webMUSHRA Software, including testing code and configurations run with Docker for paired_comparison and subjective_eval tests; Results, storing both raw and processed data from the listening tests; Samples, providing original and convolved audio files with real and simulated Room Impulse Responses (RIRs); and Utils, featuring scripts for generating sine sweeps, convolving and clipping audio, and performing basic statistical analysis. For questions, contact b.christensen@student.tudelft.nl.
The GLobal Ocean Data Analysis Project (GLODAP) is a cooperative effort to coordinate global synthesis projects funded through NOAA/DOE and NSF as part of the Joint Global Ocean Flux Study - Synthesis and Modeling Project (JGOFS-SMP). Cruises conducted as part of the World Ocean Circulation Experiment (WOCE), Joint Global Ocean Flux Study (JGOFS) and NOAA Ocean-Atmosphere Exchange Study (OACES) over the decade of the 1990s have created an oceanographic database of unparalleled quality and quantity. These data provide an important asset to the scientific community investigating carbon cycling in the oceans.