TRee-based Exploration of Neighborhoods and Domains

Maintenance works will be carried out May-26-2020 at 11:00 (EDT). We ask you to finish your jobs prior to this date. The server will be stopped and the running jobs will be lost. The server will start working May-26-2020 at 14:00 (EDT).


1) In order to successfully draw gene neighborhoods please provide identifiers of proteins from the RefSeq database. This database is listed in 'Database' section of the BLASTP page as 'Reference proteins (refseq_protein)' option. Once you selected this option you can run BLASTP against this database using your protein of interest and collect homologous proteins and use this collected sequence set as input for TREND.

2) If even using the RefSeq proteins you still couldn't retrieve the gene neighborhoods for some of your proteins this means that corresponding organisms are not in our database yet. Thousands of new genomes get deposited to NCBI every week. After quality assessment and ensuring that the genomes meet the depositions standards they slowly get migrated to the RefSeq database. Once the genomes are there the MiST database will collect and process them and finally TREND will be able to process and show the neighboring genes of corresponding genomes.

3) When you collected a set of homologous proteins, for example running BLASTP, redundancy reduction step is necessary, because in the final set numerous identical or very similar sequences will be present. The file containing sequence clusters gets generated by TREND and can be used to identify how many and which sequences each representative sequence, sent over to the pipeline after the reduction step, represents.

4) FFT-NS-i - is a fast and of high quality alignment algorithm. Once you figure out your data, you may use a refined representative set of sequences of a smaller size to run more robust L-INS-i, G-INS-i or E-INS-i algorithms. L-INS-i is recommended when proteins have one common alignable region, G-INS-i - when proteins can be aligned along the entire length, and E-INS-i - when proteins have several alignable regions interspersed with unalignable less common regions.

5) FastTree - is approximately Maximum-Likelihood algorithm that produces phylogenetic trees of very good quality. We recommend using it as a first exploratory step of your analysis or as the only step if you have a large dataset. Once exploring the FastTree you established the kind of data your are dealing with try using MEGA algorithms with the refined dataset of a smaller size. You may reduce the dataset redundancy running the redundancy reduction step of TREND.

6) Not shared domains tolerance parameter adjusting can help to identify gene clusters that have only some domains common between them. Unobvious subtle regularities can be uncovered using this parameter.