Shashata Sawmya

Educator | Researcher | Tech Enthusiast

I am a Lecturer in the Department of Computer Science and Engineering in Bangladesh University of Engineering and Technology (BUET). I received my B.Sc. in Computer Science and Engineering from the same university. I completed my undergraduate thesis under the supervision of Dr. Md. Shamsuzzoha Bayzid. I am also currently working as a student research intern in Xu Lab of Computational Biology department of Carnegie Mellon University. My research interests broadly lies in:

  • designing efficient and scalable computational models for solving real biological problems.

  • analyzing and finding insights in genomics / proteomics / phylogenomics / bioimage data using various algorithmic and learning approaches.

  • designing powerful and robust vision architectures for biomedical image segmentation and analysis.

  • application of machine learning in computational biology and bioinformatics

Apart from my academic activities, I like to travel to new places. I have visited three countries till now and it's in my bucket list to increase the number to fifty before I turn 50. I like to hangout with my friends and family in my off-time and sing a few songs whenever I can.


Interests

  • Computational Biology and Bioinformatics

Education

  • B.Sc. in CSE (2016 - 2021) (Ranked 3rd in class)

    Bangaldesh University of Engineering and Technology



Research

QT-GILD: Quartet based gene tree imputation using deep learning improves phylogenomic analyses despite missing data

Co-Authors: Sazan Mahbub, Arpita Saha, Dr. Md. Shamsuzzoha Bayzid

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present QT-GILD, an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing (NLP), which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical data sets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but it can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data. QT-GILD is freely available in open source form at this link .

Status: Accepted in RECOMB 2022. Preprint available at biorxiv.

Analyzing hCov Genome Sequences: Predicting Virulence and Mutation

Shashata Sawmya, Arpita Saha, Sadia Tasnim, Dr. M. Sohel Rahman

Covid-19 pandemic, caused by the SARS-CoV-2 genome sequence of coronavirus, has affected millions of people all over the world and taken thousands of lives. It is of utmost importance that the character of this deadly virus be studied and its nature is analyzed. We present here an analysis pipeline comprising a classification exercise to identify the virulence of the genome sequences and extraction of important features from its genetic material that is used subsequently to predict mutation at those interesting sites using deep learning techniques. We have classified the SARS-CoV-2 genome sequences with high accuracy and predicted the mutations in the sites of Interest. In a nutshell, we have prepared an analysis pipeline for hCov genome sequences leveraging the power of machine intelligence and uncovered what remained apparently shrouded by raw data. The All the codes and data (except for the Genome Sequences) of our pipeline can be found at this link

Status: Under review. Preprint available at biorxiv.

Phylogenetic Analyses of SARS-CoV-2 Strains Reveal its Link to the Spread of COVID-19 Across the Globe

Shashata Sawmya, Arpita Saha, Sadia Tasnim, Tanvir Alam, Dr. M. Sohel Rahman

This study leveraged the phylogenetic analysis of more than 10,000 genome sequences of novel coronavirus (SARS-CoV-2) from 67 countries. Due to the requirement of high-end computational power for phylogenetic analysis, we leverage a fast yet highly accurate alignment-free method to develop the phylogenetic tree out of all the strains of novel coronavirus. K-Means clustering and PCA-based dimension reduction technique were used to identify a representative strain from each location. The resulting phylogenetic tree was able to highlight evolutionary relationships of SARS-CoV-2 genome and, subsequently, linked to the interpretation of facts and figures across the globe for the spread of COVID-19. Our analysis revealed that the geographical boundaries could not be explained by the phylogenetic analysis of novel coronavirus as it placed different countries from Asia, Europe and the USA in very close proximity in the tree. Instead, the commute of people from one country to another is the key to the spread of COVID-19. We believe our study will support the policymakers to contain the spread of COVID-19 globally.

Status: Accepted at MedInfo 2021.

Teaching Experiences

July 2021 (BUET)

  • CSE462 Algorithm Engineering Sessional
  • CSE412 Simulation and Modelling Sessional
  • CSE326 Information System Design Sessional
  • CSE314 Operating System Sessional
  • CSE284 Digital Techniques Sessional
August 2021 - Present

Summer 2021 (BracU)

  • CSE230 Discrete Mathematics
  • CSE422 Artificial Intelligence
  • CSE471 System Analysis and Design
June 2021 - September 2021

Spring 2021 (UIU)

  • CSE2233 Theory of Computation
  • CSE2213 Discrete Mathematics
  • CSE1326 Digital Logic Design
  • CSI342 Artificial Intelligence
February 2021 - June 2021

Awards & Honors

  • Winner - Honda Young Engineer and Scientist (Y-E-S) Award 2019
  • Deans List Award - 4 times
  • University Merit Scholarship - 8 times
  • Higher Secondary Certificate Examination Talentpool Scholarship
  • Honourable Mention, Extraordinary Academic and Extra-curricular activities award of Notre Dame College, Class of 2015
  • 1st Place GDG (Google Developers Group) Devfest Hacksprint

Work Experience

Lecturer, Bangladesh University of Engineering and Technology

September 2021 - Present
Lecturer, Brac University

June 2021 - August 2021
Lecturer, United International University

February 2021 - June 2021
System Engineer (Part-time), Vertical Innovations Ltd.

February 2021 - June 2021

Extra-Curricular Activities