s•nr: A Visual Analytics Tool
s·nr: Visual Analytics Framework for Contextual Analyses of Private and Public RNA-Seq Data:
Next-Generation Sequencing (NGS) has been widely accepted as an essential tool in molecular biology. Reduced costs and automated analysis pipelines make the use of NGS data feasible even for small labs, yet the methods for interpreting the data are not sophisticated enough to account for the amount of information.
Often, the learning lies not in analysing isolated datasets, but knowledge can rather be gained by context information, i.e. visually comparing datasets side-by-side. This approach to data exploration is termed Visual Analytics.
We have developed s·nr (pronounced sonar), a Visual Analytics pipeline that provides simple yet powerful visual interfaces for displaying and querying NGS data. It allows researchers to explore their own data in the context of experimental data deposited in public repositories, as well as to extract specific data sets with similar gene expression signatures. We tested s·nr on 1,543 RNA-Seq based mouse differential expression profiles derived from the public ArrayExpress platform. The repository of processed data is available with out paper (Klemm et al 2019 BMC Bioinformatics).
s·nr is easily deployable utilizing its containerized implementation, empowers researchers to analyze and relate their own RNA-Seq as well as to provide interactive and contextual crosstalk with data from public repositories. This allows users to deduce novel and unbiased hypotheses about the underlying molecular processes.
The analysis focus is typically on one experiment. To put it into context of other data and observe pat- terns over multiple experiments, we include additional data. The user interface is divided into two major components. The overview visualization shows which experiments express similar expression profiles and allows users to select experiments for further investigation (Illustration 1).
Illustration 1: Overview Principal Component Analysis (PCA) plot for 1,543 differential RNA-Seq expression mouse profiles. We derived the data from ArrayExpress and processed it using QuickNGS and provide the result with this paper. The analysis starts with the overview plot showing the first two PCs of the p-values of all genes of public and private data sets. Public data sets uniformly are assigned the box icon and a higher transparency to allow for easy identification of the users data. Data with similar p-values cluster together. (a) Mouse-over shows meta data of the data set. Clicking on a data set icon fetches it data from and passes it to the details view. Icons of downloaded data are rendered orange, data sets that are loading are flashing. (b) The PCA displayed is calculated on all genes. On brushing data sets in the details view, the user can narrow down the genes of interest and trigger a new PCA calculation based on the selected group of genes. (c) The dot of the s·nr logo emits a fading circle when data is fetched from the server.
The selected experiments can be analyzed in the details view that provides simple yet efficient means for displaying and querying the data as well as extracting GO terms. The selection of genes feeds back into the overview visualization, which can be triggered again to only consider the user-defined subset of genes and refines the search of functionally similar experiments (Illustration 2).
Illustation 2: Details view for five mouse data sets with open GO term pane. The details view consists of three major interconnected components (a–c). Additionally, the GO term pane (d) is open. (a) The focus experiment is depicted in the large scatter/hex plot. (b) Further data sets are visualized in the small multiples of the large scatter plot. (c) The table view shows detailed information per gene for the main experiment. (d) The GO term pane shows GO terms for the selected genes and associated options. Opening a GO term displays additional information about the term as well as the expression of it in the context experiments. Each GO term is represented using a GO plot which can be customized in the panes options. We depict a typical interaction example at the bottom, where brushing (selecting) genes in the main scatter plot leads to highlighting the corresponding genes in the context experiments and also automatically triggers a GO-term analysis of the selection.