COBI Lab - TimesVector





TimesVector-web tutorial

  • The TimesVector-Web present analysis of time series RNA sequencing data
    • Recommand K for K-means clustering by elbow point
    • Cosine similarity mesured clustering
    • Enrichment & Pathway information from g:Profiler result
    • additional geneID mapping to CISBP & miRDB database for TF & miRNA
  • Follow steps

TimesVector-web workflow


TimesVector-web input interface

TimesVector-web input interface is divided into four sections.

① Input files info

  • Upload the gene expression files to analyze
  • Check whether the information on the uploaded file is correct through the info table.

② Options for input file

  • 'Use only protein coding genes'
    • Users can constrain the use of protein-coding genes only in the gene list
    • If the user clicks 'yes', only genes that are labeled as protein coding are selected from the input file
  • 'Data type'
    • This option is to select whether the data type of the input file is microarray type or RNA-seq type.
  • 'Do you need Normalization?'
    • TimesVector-web supports option for normalizing input data
    • If the user clicks 'yes', quantile normalization is performed on the data

③ Characteristics of input file

  • K is the number of clusters desired to detect (INTEGER)
    • If K is unknown, you can get recommendated k by clicking on "K test"
    • K is recommended by elbow method graph
  • Max K is a parameter for setting the range of K user want to select
  • Ex) Elbow method graph after elbow method is performed (Max K: 800)

④ Organism for biological downstream analysis

  • By selected organism, the gene list of cluster is converted to RefSeq id and ENSEMBL id from Biomart
  • The selected organism and converted ENSEMBL id list are the parameter for g:Profiler
  • TimesVector-web supports organisms such as Homo_sapiens, Mus_musculus, Oryza_sativa_Japonica_Group and Saccharomyces_cerevisiae.

⑤ Run TimesVector by clicking "RUN"

  • Wait until process is finished
    • If you don't want to only staring at monitor you can use shortcut
    • Copy short cut code after run and paste in main page
    • Every results will be removed every second days

STEP ONE : Data pre-processing

This step encompass several requirements for pre-processing the data the user wants to analyze on TimesVector-web. First, our web service only supports time series and multiple condition RNA sequencing data. The user must prepare input files according to the multiple condition of the data to be analyzed. For example, if there are three conditions of the files to be analyzed, three input files are required. As shown in Figure 1, the user must prepare three input files that meet each condition. These data can be downloaded by GeneExpressionOmnibus(GEO). Second, before uploading your data, you need to convert the file format to meet the requirements described below.

  • The input file should be a TAB delimited gene expression matrix.
  • It is recommended that the user upload the file to be analyzed as a file composed of the same number of genes.
  • Our web service recognizes the first row of the input file as a header.
  • The first column 'GeneID' of the header is mandatory and must be used as is.
  • Except for the first column, the names of the other columns follow the following syntax.
    • 'Condition_'TimePoint'(e.g, DV10_Day2).
    • The condition and time points are seperated by an underline character("_").
    • If the replicates of experiments for input file exists then 'Condition'_'TimePoint'_'Replicates'(e.g, DV10_Day2_rep1).



    Figure 1: Above is sample input data files. The first column contains GeneID and the remaining columns contain the time series of gene expression values. In addition, the input files have three condition files: A, B, and C, and each condition file has three time points of 20 minutes, 40 minutes, and 60 minutes.



    Figure 2: Above is an input file in which the replicates of experiments exist.

* If you are uncomfortable with programming, recommend that you use Spreadsheets tools


STEP TWO : Upload input files

This step is for the user to upload the data obtained through STEP ONE.

  • First, click the browse button.



  • Second, please select the files to upload as shown in the picture below.




  • Third, check whether the information on the uploaded file is correct through the info table.



STEP THREE : Select options for input files

This step proposes several options that help the user analyze. The TimesVector-web offers three options.

  • 'Use only protein coding genes'
  • This option selects only protein coding genes for input files uploaded by the user.

  • 'Data type'
  • This option is to select whether the data type of the input file is microarray type or RNA-seq type.

  • 'Do you need Normalization?'
  • This option performs log2 and quantile normalization on the user's input file.


STEP FOUR : Select the number of clusters on input data

This step is to recommend the appropriate number of clusters for input data.
To analyze the gene expression pattern in data, it is important to select an appropriate number of K clusters. The appropriate number of clusters here is when the number of clusters is the smallest, the distance of genes within the same cluster is small and the distance of genes across different clusters is large. Our web service recommends the appropriate number of clusters through the 'K-test' button and 'maxK' parameter. This parameter is for setting the range of K that the user wants to select since the result of K test can vary depending on the size or characteristics of the input data.



    Figure 1: Above is the section for selecting the appropriate number of clusters. If not given K, then execute with expected K. Here, expected K is determined based on the formula used in the TimesVector.

If the user clicks the K-test button, it recommends the appropriate number of clusters (K) for input data as shown in the Figure2. The K-test can take several minutes depending on the size of the data and maxK.


    Figure 2: Above is the recommended k graph of selecting maxk as 800 and performing k-test.

STEP FIVE : Select an organism for biological downstream analysis.

This step is to select an organism corresponding to the user's input data. The organism is the parameter for g:Profiler and our web service supports organisms such as Homo_sapiens, Mus_musculus, Oryza_sativa_Japonica_Group and Saccharomyces_cerevisiae.




    Figure 1: Above is the organism select box.

STEP SIX: Run the TimesVector-web

  • After uploading data and selecting parameters user can run the analysis.
  • If program run successfully, short cut code is generated.
  • User can wait on this page untill program finish.

  • Or user can copy short cut code and leave this page. And find result page with pasting short cut code on the Home page.
  • The prgram could not finished until 30 minutes depending on the data size.


STEP SEVEN : Interpret your Results

  • Summary of the result
  • Thumbnail of Clusters
    • The visualized plots of DEP, ODEP and SEP clusters are buttons for selecting the patterns
  • Genes in cluster
    • Left plot is showing only cluster representative genes
    • Right plot is showing every genes in cluster
  • Gene List
    • List of genes in the cluster
    • Gene description
    • Gene type
    • Matched RefSeq ID
    • Matched EMSEMBL ID
  • cisBP
    • Result of mapping gene set with Transcription Factor(TF) database cisBP
  • miRDB
    • micro RNA(miRNA) prediction and functional annotation database miRDB
    • The number of counted microRNAs in the cluster is show above
    • The list of microRNAs in the cluster is show below
  • g:Profiler result
    • g:Profiler result with gene set and selected organism
    • For running g:Profiler, every genes in every cluster of each DEP, ODEP, SEP will be used.