BUET-NUS CSW (Computer Science Workshop)

This workshop will be held on 02 March 2018 as a co-located workshop with WALCOM 2018. The program can be downloaded from here. The details of the talks are as follows:

Isam FAIK
National University of Singapore, Singapore
Title: Digital platforms: A new way of organizing the world
Abstract: This talk will provide an introduction to research in the academic field of ‘Information Systems’ by discussing the rise of digital platforms as a new form of organizing a wide range of human activities, including market transactions, social interactions, and political processes.

Hon Wai LEONG
National University of Singapore, Singapore
Title: Algorithmic pipeline for reconstruction of metabolic networks
Abstract: The metabolic network of a cell is a set of interconnected metabolic processes that determines the cell’s physiological and biochemical properties. With high-throughput sequencing of new species, there is a need to quickly reconstruct whole-genome metabolic networks of new species. The traditional pipeline for metabolic network reconstruction starts from genome annotation, and is followed by network assembly, model checking, gap filling and network enhancement. This process involves a high degree of manual curation and is very time consuming and causes a bottle neck for obtaining high quality metabolic networks for new species. In this talk, we discuss an algorithmic pipeline (called NetA) for faster reconstruction of an initial metabolic networks, using tools that were initially developed for metabolic gap filling (MeGaFiller) and for enzyme function annotations (EnzDP). In summary, the aim is to automate the process of reconstruction of an initial metabolic network so that it can be done quickly and without manual effort. Then manual curation will only be required in the later part, to enhance the network if necessary.

Trevor E. CARLSON
National University of Singapore, Singapore
Title: Memory-level parallelism: An overlooked path to processor efficiency
Abstract: Historically, processor performance has been pursued in the form of general-purpose instruction level parallelism (ILP) at a significant resource cost. As the benefits of Dennard Scaling come to an end, the limited gains from newer process technologies will no longer allow for these increasingly complex and power-hungry methods to improve performance. One potential solution, hardware acceleration, is an important trend in computer architecture, but recent work has shown that the use of accelerators will only exacerbate the problem by increasing the amount of irregular, difficult to predict code to be executed on the cores. Computer architects need a solution that addresses two critical concerns: (1) high-performance, energy efficient computation and (2) fast and flexible software as commonly seen in data centers and mobile devices. In this seminar, I discuss innovative microarchitectures to improve performance in the presence of these processor power limits. I present our recent work which focuses on Memory Level Parallelism (MLP) and a novel learning algorithm to allow the hardware to optimize the use of these precious resources. In addition, I discuss new processor types that, by leveraging compiler techniques to improve MLP along with hardware assistance, allows for improved performance and energy efficiency by extending the reach of hardware past traditional limits. Taken together, these approaches demonstrate how a focus on MLP improves efficiency and performance for both general-purpose style processors as well as highly efficient processors for future IoT systems from Fog Computation up to the cloud servers that serve them.

Mun Choon CHAN
National University of Singapore, Singapore
Title: Using barometer for low power context detection
Abstract: Accelerometer is the predominant sensor used for low-power context detection on smartphones. However, accelerometer is orientation- and position-dependent, requires a high sampling rate, and subsequently complex processing and training to achieve good accuracy. We present an alternative approach for context detection using only the smartphone’s barometer. The barometer is independent of phone position and orientation. Using a low sampling rate of 1 Hz, and simple processing based on intuitive logic, we demonstrate that it is possible to use the barometer for detecting the basic user activities of IDLE, WALKING, and VEHICLE at extremely low power. We evaluate our approach using 47 hours of real-world transportation traces from 3 countries and 13 individuals, as well as more than 900 km of elevation data pulled from Google Maps from 5 cities, comparing power and accuracy to Google’s accelerometer-based Activity Recognition algorithm, and to Future Urban Mobility Survey’s (FMS) GPS-accelerometer server-based application. Our barometer-based approach uses significantly less power and has comparable accuracy to both Google and FMS.

Ee-Chien CHANG
National University of Singapore, Singapore
Title: Adversarial machine learning: Pitfalls in security applications
Abstract: Machine learning techniques have witnessed a steady adoption in a wide range of applications, and have also lent themselves to security tasks. Numerous innovative applications of machine learning in security contexts, especially for detection of security violations, have been discussed in the literature. However, when learning-based systems are deployed for security applications, their effectiveness may be challenged by intentional noise and deviations, and susceptible to evasion through adversarial data manipulation. In this talk, I will give an overview of adversarial machine learning, and present our recent results on classifiers evasion through morphing. I will also give a brief outline of security research activities in Department of Computer Science, National University of Singapore.

Wing-Kin Sung
National University of Singapore, Singapore
Title: Faster algorithms for 1-mappability of a sequence
Abstract: Consider a string S of length n. The m-mers of S are all the length-m substrings of S. The k-mappability problem asked to count, for each m-mer x of S, the number of other m-mers y of S that are at Hamming distance at most k from x. This problem finds application in determining the repeat regions in our genome. Here, we focus on the version of the problem where k=1. The fastest known algorithm for k=1 requires O(mn log n / log log n) time. This talk presents some improvement on this problem.

Limsoon WONG
National University of Singapore, Singapore
Title: Big data and a bewildered lay analyst
Abstract: A lay analyst faces a number of problems when he has to analyze a dataset. The first problem is that he has to correctly test his hypothesis on the dataset, assuming he has a hypothesis to start with. If he does not already have a hypothesis, he has a second problem, which is to identify some interesting hypothesis from the thousands of patterns that are present in the dataset. The third problem is that he has to derive deep insight from the hypothesis tested. For the first problem, I describe some common mistakes made by lay analysts when they perform statistical hypothesis testing, due to invalid assumptions on samples’ fidelity to real-world populations, null distribution’s appropriateness, null hypothesis’ sensibility, and absence of confounding factors. For the second problem, I describe a tactic of using contingency tables to organize and summarize the deluge of patterns produced by data mining systems. For the third problem, I describe a tactic of looking for exception, trend reversal, and trend enhancement to derive deeper insight from an initial hypothesis.

A. B. M. Alim Al Islam
Bangladesh University of Engineering and Technology, Bangladesh
Title: Computing Solutions to Serve the Under-Served
Abstract: Research efforts, till now, on computing solutions exhibit little focus on the under-served people even though a substantial part of the world’s whole population is still under-served. This happens perhaps due to the challenges of reaching them as well as of devising solutions for them while being consistent with their economical constraints. In this talk, I would like to focus on our attempts for overcoming the challenges and for devising specialized computing solutions for under-served people. The attempts to be presented in the talk cover the following three specific avenues: 1) devising solutions to bridge education barriers experienced by under-served visually-impaired people, 2) devising a solution to overcome contamination hazards experienced while using shared input devices (for example Kiosks) mostly by under-served people, and 3) developing a solution for securing railways of third-world countries.

Mohammad Saifur Rahman
Bangladesh University of Engineering and Technology, Bangladesh
Title: Sequence based computational methods for protein attribute prediction
Abstract: Due to the rapid development of fast sequencing technologies, the number of sequence-known proteins has grown exponentially in recent years. On the contrary, the biochemical experiments to learn the attributes of proteins are expensive and time consuming. A large gap thus exists between the number of sequence-known proteins and that of attribute-known proteins. To catch up, researchers have started to rely on computational methods to predict different attributes of proteins. These attributes include, but are not limited to, protein structural class, folding rate, cleavage site, antigenicity, subcellular location and so on. In this talk, we will present sequence based computational methods for protein sub-Golgi localization, antigenicity prediction and identification of DNA-binding proteins.The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. GA proteins can be categorized into two types, namely, cis-Golgi proteins and trans-Golgi proteins. The exact classification of GA proteins may contribute to developing drugs to treat neurodegenerative and inherited diseases. While several recent predictors demonstrate good predictive performance in sub-Golgi localization, one drawback of these predictors is that they depend on construction of PSSM, which is time consuming. We propose a new predictor that is independent of PSSM and based principally on sequence information. We then talk about DNA-binding proteins (DNA-BP) which can bind and interact with a DNA. Such proteins organize and compact the DNA, regulate and affect various cellular processes like transcription, DNA replication, recombination, repair and modification. DNA-BPs can potentially be used for drug development in treating genetic diseases and cancers. This is why developing efficient and highly accurate methods to identify DNA-BPs is a very important research problem. We propose a simple yet effective predictor for DNA-BPs in this talk. Another important protein attribute prediction problem is whether a given protein is antigenic or not. A protein is antigenic if it is capable of triggering a significant immune system response. Such proteins are of immense importance in vaccine preparation and drug design. We propose a new protective antigen predictor that has a simple architecture and can work with features extracted from the sequence alone.

Rifat Shahriyar
Bangladesh University of Engineering and Technology, Bangladesh
Title: High performance garbage collection to manage future memory better
Abstract: Garbage collection design and implementation are both characterized by stark choices. Garbage collection designs must choose between tracing and reference counting. Garbage collector implementations must choose between exact and conservative collection. Performance concerns have led to tracing and exact collection dominating, a choice evident today in highly engineered systems such as HotSpot, J9, and .NET. However, many other well-established systems use either reference counting or conservative garbage collection, including implementations for widely used languages such as PHP and JavaScript. Today reference counting and conservative garbage collection are widely used, but generally in non-performance critical settings because their implementations suffer significant overheads. This talk will focus on the design and implementation of new algorithms and mechanisms for reference counting and conservative garbage collection that significantly improve performance to the point where they are competitive with today’s best copying generational tracing collectors. These insights and advances are likely to particularly impact the development of new and emerging languages, where the implementation burden of tracing and exactness is often the critical factor in the first implementation.

Md. Shamsuzzoha Bayzid
Bangladesh University of Engineering and Technology, Bangladesh
Title: Estimating species trees from gene trees
Abstract: Phylogenetic trees (evolutionary trees) provide insights into basic biology, including how life evolved, the mechanisms of evolution and how it modifies function and structure, orthology detection, disease evolution etc. Estimations of species trees are typically based on multiple genes, in some cases from throughout the whole genome. Central to constructing phylogenetic trees is the ability to efficiently analyze the vast amount of genomic data available these days due to the tremendous advancement in sequencing techniques. The ongoing big data revolution in genomics can vastly increase our understanding of biology only if our computational toolkit can keep up with the pace of ever increasing abundance of molecular data. In this presentation, we will focus on fast and accurate species tree estimation from genes sampled throughout the whole genome, considering various challenging scenarios that frequently arise in phylogenomic analysis.