Protein repeats and compositionally biased regions

Definition, tools for detection, interpretation of results, function and structure

Miguel Andrade

This part is split in two sessions. The first one introduces repeats and compositionally biased regions (CBRs) and then deals with repeats. The second part deals with CBRs. There are exercises in the two sessions. In the exercises, I try to use online tools or JAVA software that should run in your computer without the need of extra installations: JalView (also used in another lessons), Dotlet, BiasViz2. You should be able to have a look at some 3D examples using Chimera on PDB files or using other online PDB viewers.

See below for the exercises for these lessons with links to the sequences and files you have to use.

Find here links to the corrresponding powerpoint files for each presentation: Part 1 on repeats: andrade_repeats Part 2 on CBRs: andrade_biased5

Part 1. Protein repeats

Exercise 1. Using Dotlet with the human mineralocorticoid receptor (MR):

  • Obtain the sequence from UniProt
  • Go to the Dotlet web page
  • Click on the input button and paste the sequence there
  • Try to find combinations of parameters that show patterns in the dot plot
  • Find repetitions clicking in the diagonal patterns

Exercise 2. Using JalView with a MSA of the MR with orthologs:

  • Run JalView using the JNLP file in desktop (from
  • Load this MSA in JalView
  • Use the “find” option with a regular expression and mark all matches
  • Try to find the expression that matches more repeats (example: [LS].SP)
    • How many repeats do you see?
    • How long are they?
    • Would you correct the alignment based on these findings?

Exercise 3. Detecting repeats in human huntingtin:

  • Go to the ARD2 web page
  • Click on the input button and paste this sequence (1-780 fragment of human huntingtin) in the input window
  • Run ARD2 and interpret the output

Exercise 4. Viewing detected repeats in a protein structure:

  • Go to the PDBPaint web page
  • In the “Query PDB” window type “2IE3”, and in the “web service” menu choose the “ARD2” option
  • Hit the “Go!” button
  • Turn around the structure and examine the correspondence between the hits and the structure

Part 2. Compositionally biased regions

Exercise 1. Filtering CBRs for BLAST using SEG

  • Obtain this protein sequence from NCBI. This is a hypothetical protein from Nematocida sp., a microsporidia (spore-forming fungi) that infects the worm Caenorhabditis elegans.
  • Can you see funny things in this sequence?
  • Go to the NCBI’s BLAST web page and go to the “protein blast” option
  • Search for homologs of the protein
  • Keep the ouput
  • Do the same search in another NCBI’s BLAST window selecting the filter low complexity regions using SEG option
  • Compare the outputs: Can you identify different hits? Do matches to the same sequence have relevant differences in the E-value? Comment on the relevance of the differences.

Exercise 2. Viewing CBRs in an alignment with BiasViz2:

  • Go to the BiasViz2 web page
  • Launch BiasViz2
  • Load this alignment on the step 1 section
  • Hit the “Go to graphical view” button
  • This protein is rather short so you might have to zoom in a bit (zoom controls are on the top right corner of the graphics window)
  • Try to find combinations of parameters that reveal CBRs
  • Try hydrophobic residues and window size 10. If I tell you this is a transmembrane protein, what is this result telling you?
  • Can you see other biased regions?

Exercise 3. All together! View repeats, CBRs, and secondary structure in the N-terminal of huntingtin with BiasViz2:

  • Go to the BiasViz2 web page
  • Load this alignment of N-terminal huntingtins on the step 1 section
  • Load this file with secondary structure predicted for the human fragment in the step 2 section
  • Load this file with ARD2 predictions for all sequences of the alignmnent in the step 2 section “raw values for each amino acid”
  • Hit the “Go to graphical view” button
  • Find the CBRs we have discussed for huntingtin
  • Compare the relative position of the predicted repeats and the predicted secondary structure



Protein repeats: structures, functions, and evolution.
Andrade MA, Perez-Iratxeta C, Ponting CP.
J Struct Biol. 2001 May-Jun;134(2-3):117-131.
PMID: 11551174

Functional insights from the distribution and role of homopeptide repeat-containing proteins.
Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, Whisstock JC.
Genome Res. 2005 Apr;15(4):537-551.
PMID: 15805494

A census of protein repeats.
Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D.
J Mol Biol. 1999 Oct 15;293(1):151-160.
PMID: 10512723

BiasViz: visualization of amino acid biased regions in protein alignments.
Huska MR, Buschmann H, Andrade-Navarro MA.
Bioinformatics. 2007 Nov 15;23(22):3093-3094. Epub 2007 Oct 6.
PMID: 17921493

Protein simple sequence conservation.
Sim KL, Creamer TP.
Proteins. 2004 Mar 1;54(4):629-638.
PMID: 14997559

Detection of alpha-rod protein repeats using a neural network and application to huntingtin.
Palidwor GA, Shcherbinin S, Huska MR, Rasko T, Stelzl U, Arumughan A, Foulle R, Porras P, Sanchez-Pulido L, Wanker EE, Andrade-Navarro MA.
PLoS Comput Biol. 2009 Mar;5(3):e1000304.
PMID: 19282972