Cambridge-Africa

Applied biostatistics training using R software - a first in Tanzania

Mazigo 1

By Professor Humphrey Mazigo, Dean of the School of Public, Health Catholic University of Health and Allied Sciences (CUHAS), Bugando, Tanzania

With funding from the Cambridge-Africa ALBORADA Research Fund, the Catholic University of Health and Allied Sciences, Bugando, Tanzania delivered a course in applied biostatistics training using R software. The course was aimed at postgraduate students and faculty in medical schools in East African countries.

Research output in Africa, including in Tanzania, would be enhanced through using better methodology to collect data, as well as improved analyses and interpretations. Inappropriate and incorrect application of statistical methods, incomplete interpretations of research findings, insufficient description of key variables, and poor knowledge of statistics are among the commonest errors by authors of biomedical research. To help address the gaps in statistical knowledge and skills, the training was conducted to equip postgraduate students and junior researchers with statistical modelling, using freely available statistical software, R©.

There was high demand for the course with 940 applicants. In two blocks of five weeks, sessions were conducted both virtually and in person to 180 out of the 940 applicants to the course. The applicants were from East African countries and beyond, the majority (75%) being early career researchers. In a five-week period, basic topics on R and advanced statistical modelling using categorical outcomes were covered.

Of the 180 attendees, 70% completed the course. All sessions were facilitated through a blend of methods – sessions being delivered virtually and face to face simultaneously. The course was divided into two blocks. Block One covered use of R for statistical analysis. To accommodate applicants with different knowledge and skill levels, we started with simple and basic principles of R studio. Then we covered data management and cleaning, with an emphasis on good command writing tips for beginners. Simple data analyses such as correlation analysis, student t test, Chi square test, and ANOVA were taught.  

Block 2 covered more advanced statistical modelling topics with a focus on generalized linear models for categorical outcomes. Each topic was followed by hands on sessions using actual datasets. Participants were able to do simple analyses, visualization, data wrangling using actual datasets that were provided by facilitators. Furthermore, participants also worked on actual datasets with different outcome variables namely binary, ordinal, multinomial, and checking for assumptions including model selection and best fit using Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC).

The training was conducted by a group of experienced facilitators with PhDs in epidemiology and Statistics or with MScs in Epidemiology and Biostatistics.  We would like to thank the facilitators for their participation in this training initiative.

 At the end of the training, participants evaluated the sessions with focus on level of satisfaction and relevance of the material provided.  In terms of course content, participants expressed satisfaction with the topics/models covered, particularly in areas such as data wrangling (54.6% very satisfied), functions (63.6% very satisfied), and GLM modelling (72.7% very satisfied).

The e-learning platform was highly regarded, with 81.8% of participants very satisfied and 100% satisfied with the online assistant. Overall, 81.8% of all participants reported being very satisfied with the course, and all the participants expressed their intention to recommend it to others. In addition, remarks were also observed in block one of the courses where 92.7% of participants reported being very satisfied. The variation observed can be explained by the complexity of the courses. In block one it was more of a basic introduction to the courses, while in block two it was advanced (GLM). Regarding the relevance of the course, the majority (97.8%) of the participants expressed their gratitude to funder for the learning opportunities on applied statistics using R software which usually available but not accessible due to cost constraints.

 In general, this was the first course of its kind to be offered in Tanzania and learning from the course, the demand for the course is high especially for postgraduates students and faculty starting their career in research.