What is Dotlet in Bioinformatics

Why is this knowledge important? Intensive engagement with dot plots is also worthwhile because the data structure of the 2D matrix and the concept Evaluate diagonals, in which other algorithms for sequence comparison (keyword calculation of alignments) are taken up. In addition, dot plots make it easy to compare the composition of genomes of closely related species.referenceThese exercises supplement chapter 9 "Pairwise sequence comparison".

Learning objective

After completing the exercise, you should understand:
  • The principle of the dot matrix
   

 

Some of the following examples are taken from the Dotlet package by M. Pagni and T. Junier.

 

 exerciseDotplot_1  The following sequences are given:

 

SEQ_A GHR.QS.GG
SEQ_B S.GGR.G  

 

 Calculate a dot plot with paper and pencil D..  The following condition applies to the filling of the matrix D.:  
     
Discuss the outcome, what is the longest common infix?
     solutionHere you will find the solution to the task.    Preparation for the following exercises    
Clicking this link activates Dotlet, a program that interactively generates dot plots.

Submit the following sequences via copy and paste to the applet.

To do this, press each time inputKey and give the sequences the specified names.

Read the Dotlet help file.

 

 Load sequences
MS2_HUMAN (P78325)
MRGLGLWLLGAMMLPAIAPSRPWALMEQYEVVLPRRLPGPRVRRALPSHLGLHPERVSYVLGATGHNFTLHLRKNRDLLG
SGYTETYTAANGSEVTEQPRGQDHCLYQGHVEGYPDSAASLSTCAGLRGFFQVGSDLHLIEPLDEGGEGGRHAVYQAEHL
LQTAGTCGVSDDSLGSLLGPRTAAVFRPRPGDSLPSRETRYVELYVVVDNAEFQMLGSEAAVRHRVLEVVNHVDKLYQKL
NFRVVLVGLEIWNSQDRFHVSPDPSVTLENLLTWQARQRTRRHLHDNVQLITGVDFTGTTVGFARVSAMCSHSSGAVNQD
HSKNPVGVACTMAHEMGHNLGMDHDENVQGCRCQERFEAGRCIMAGSIGSSFPRMFSDCSQAYLESFLERPQSVCLANAP
DLSHLVGGPVCGNLFVERGEQCDCGPPEDCRNRCCNSTTCQLAEGAQCAHGTCCQECKVKPAGELCRPKKDMCDLEEFCD
GRHPECPEDAFQENGTPCSGGYCYNGACPTLAQQCQAFWGPGGQAAEESCFSYDILPGCKASRYRADMCGVLQCKGGQQP
LGRAICIVDVCHALTTEDGTAYEPVPEGTRCGPEKVCWKGRCQDLHVYRSSNCSAQCHNHGVCNHKQECHCHAGWAPPHC
AKLLTEVHAASGSLPVLVVVVLVLLAVVLVTLAGIIVYRKARSRILSRNVAPKTTMGRSNPLFHQAASRVPAKGGAPS
RGPQELVPTTHPGQPARHPASSVALKRPPPAPPVTVSSPPFPVPVYTRQAPKQVIKPTFAPPVPPVKPGAGAANPGPAEG
AVGPKVALKPPIQRKQGAGAPTAP
ADAM_CROAD (P34179)
QQNLPQRYIELVVVADRRVFMKYNSDLNIIRTRVHEIVNIINGFYRSLNIDVSLVNLEIWSGQDPLTIQSSSSNTLNSEG
LWREKVLLNKKKKDNAQLLTAIEFKCETLGKAYLNSMCNPRSSVGIVKDHSPINLLVAVTMAHELGHNLGMEHDGKDCLR
GASLCIMRPGLTPGRSYEFSDDSMGYYQKFLNQYKPQCILNKP
SLIT_DROME (P24014)
MAAPSRTTLMPPPFRLQLRLLILPILLLLRHDAVHAEPYSGGFGSSAVSSGGLGSVGIHIPGGGVGVITEARCPRVCSCT
GLNVDCSHRGLTSVPRKISADVERLELQGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVI
TTVGRRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRALRLSDNPFACDCHLSW
LSRFLRSATRLAPYTRCQSPSQLKGQNVADLHDQEFKCSGLTEHAPMECGAENSCPHPCRCADGIVDCREKSLTSVPVTL
PDDTTDVRLEQNFITELPPKSFSSFRRLRRIDLSNNNISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLL
LNANEISCIRKDAFRDLHSLSLLSLYDNNIQSLANGTFDAMKSMKTVHLAKNPFICDCNLRWLADYLHKNPIETSGARCE
SPKRMHRRRIESLREEKFKCSWGELRMKLSGECRMDSDCPAMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGR
ISSDGLFGRLPHLVKLELKRNQLTGIEPNAFEGASHIQELQLGENKIKEISNKMFLGLHQLKTLNLYDNQISCVMPGSFE
HLNSLTSLNLASNPFNCNCHLAWFAECVRKKSLNGGAARCGAPSKVRDVQIKDLPHSEFKCSSENSEGCLGDGYCPPSCT
CTGTVVACSRNQLKEIPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFANLTKLSTLIISYN
KLQCLQRHALSGLNNLRVVSLHGNRISMLPEGSFEDLKSLTHIALGSNPLYCDCGLKWFSDWIKLDYVEPGIARCAEPEQ
MKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNAT
CTVLEEGRFSCQCAPGYTGARCETNIDDCLGEIKCQNNATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFNPCANGAK
CMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHEC
KHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAV
ELFNGRIRVSYDVGNHPVSTMYSFEMVADGKYHAVELLAIKKNFTLRVDRGLARSIINEGSNDYLKLTTPMFLGGLPVDP
AQQAYKNWQIRNLTSFKGCMKEVWINHKLVDFGNAQRQQKITPGCALLEGEQQEEEDDEQDFMDETPHIKEEPVDPCLEN
KCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQGEGSTEPPTVTAASTCRKEQVREYYTENDCRSRQPLKYAKCVGGCGN
QCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGCTKKCY
SERA_PLAFG (P13823)
MKSYISLFFILCVIFNKNVIKCTGESQTGNTGGGQAGNTVGDQAGSTGGSPQGSTGASQPGSSEPSNPVSSGHSVSTVSV
SQTSTSSEKQDTIQVKSALLKDYMGLKVTGPCNENFIMFLVPHIYIDVDTEDTNIELRTTLKETNNAISFESNSGSLEKK
KYVKLPSNGTTGEQGSSTGTVRGDTEPISDSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSESLPANGPDSPTVKP
PRNLQNICETGKNFKLVVYIKENTLIIKWKVYGETKDTTENNKVDVRKYLINEKETPFTSILIHAYKEHNGTNLIESKNY
ALGSDIPEKCDTLASNCFLSGNFNIEKCFQCALLVEKENKNDVCYKYLSEDIVSNFKEIKAETEDDDEDDYTEYKLTESI
DNILVKMFKTNENNDKSELIKLEEVDDSLKLELMNYCSLLKDVDTTGTLDNYGMGNEMDIFNNLKRLLIYHSEENINTLK
NKFRNAAVCLKNVDDWIVNKRGLVLPELNYDLEYFNEHLYNDKNSPEDKDNKGKGVVHVDTTLEKEDTLSYDNSDNMFCN
KEYCNRLKDENNCISNLQVEDQGNCDTSWIFASKYHLETIRCMKGYEPTKISALYVANCYKGEHKDRCDEGSSPMEFLQI
IEDYGFLPAESNYPYNYVKVGEQCPKVEDHWMNLWDNGKILHNKNEPNSLDGKGYTAYESERFHDNMDAFVKIIKTEVMN
KGSVIAYIKAENVMGYEFSGKKVQNLCGDDTADHAVNIVGYGNYVNSEGEKKSYWIVRNSWGPYWGDEGYFKVDMYGPTH
CHFNFIHSVVIFNVDLPMNNKTTKKESKIYDYYLKASPEFYHNLYFKNFNVGKNLFSEKEDNENNKKLGNNYIIFGQDT
AGSGQSGKESNTALESAGTSNEVSERVHVYHILKHIKDGKIRMGMRKYIDTQDVNKKHSCTRSYAFNPENYEKCVNLCNV
NWKTCEEKTSPGLCLSKLDTNNECYFCYV
EMECALM (J05545)
TGAATCCCAGTTCAGCTCTTCAGCCTTTCGTGGATAAGAGAAGGCTGAAAGCGGGTCACGTTTTGGACTAAGCGACGCCC
TTGCCAGGCATCCAGCTTAGTGGCTGTTGGTTTATTTGTAGAGTCCCCTTAACTCTCTCTCTCCCCCACATCGCCCATCTCC
ACCGACGCCTCTCTCTCTCGTGTTATTTCTCCCCATTCTCGCTTCATTTCCCATCCATTTTCGAGTTCTGCAATATCCTC
ACTAACTAGTATAGCCATGGTACGCCTCACTCGATCATCATCGTTGTTCGTGCGCTCAAACGCATCCGCTGTGCGGGGCA
GATCTACTGGTGTCCTCCTGCGTAGATGAGCTGACGACTTCACTTCCAGGCCGACTCTCTGACCGAAGAGCAAGTTTCCG
AGTACAAGGAGGCCTTCTCCCTATTTGTAAGTGCCATTGGTTACTGTTATATCAAAATCGAATTTGTATTGAGAGTATAC
TAATACATTCCGCACTAAACAGGACAAGGATGGCGATGGTTAGTGCATCTGTCCCCCCAGGCTTGATCGCATTCGCCCAG
CATGTCTGCTGTAGCTCTATATAACCGTTTCTGACAAACGGCGACAGGCCAGATTACCACTAAGGAGCTTGGCACTGTCA
TGCGCTCGCTCGGTCAGAATCCTTCAGAGTCTGAGCTTCAGGACATGATCAACGAAGTTGACGCCGACAACAATGGCACC
ATTGACTTTCCAGGTACGCGAACTCCCCAATCTACTTCGCACCAGCCTAGAAATGTACTAATGCTAAACAGAGTTCCTTA
CCATGATGGCCAGAAAGATGAAGGACACCGATTCCGAGGAGGAAATTCGGGAGGCGTTCAAGGTCTTCGACCGTGACAAC
AATGGTTTCATCTCCGCTGCTGAGCTGCGTCACGTCATGACCTCGATCGGTGAGAAGCTCACCGATGACGAAGTCGACGA
GATGATCCGCGAGGCGGACCAGGATGGCGACGGCCGAATTGACTGTACGTTGGCTCCCCGCTTATCCTTGACCGTAGAAG
AGGTATGATACTGATCGGCTGCAGACAACGAATTCGTCCAACTTATGATGCAAAAATAAACGCTCTTACCTTTGATGTTT
ATCGTTAGCGAAGAAGGTGTGGACACTTTCCAGCTGTCTCATCTTAGTTGTCATATCATTGAATGTAGCCTATCTGATTG
CGGATAAGCAACTGATGGTTGTAACGGCTTCCATTTTGCTCTGACTTCTGAGTACCCTTTTCCTTCATGTTTGTTCGTCG
ACCATTCTGCTAGTGAGATATGCGTAGAGTTGGGTAGGCTGAATTTACGAGTCTCTGTTGGGGGATATCACATGCTTCAC TACAATCTTTCTCTAC
CALM_EMENI (P60204)
MADSLTEEQVSEYKEAFSLFDKDGDGQITTKELGTVMRSLGQNPSESELQDMINEVDADNNGTIDFPEFLTMMARKMKDTD
SEEEIREAFKVFDRDNNGFISAAELRHVMTSIGEKLTDDEVDEMIREADQDGDGRIDYNEFVQLMMQK
       exerciseDotplot_2   
Compare the SLIT_DROME sequence to yourself. How do you interpret the pattern?

 

 

Set the zoom factor to 1: 5.

Adjust the controls so that identical regions are clearly visible.

Navigate with the mouse to determine at which positions within the sequence there are similar partial sequences.

You can find the database entry here. See if the regions you identify are in the section Features are listed and annotated.

 

exerciseDotplot_3

 

Compare the sequences MS2_HUMAN with ADAM_CROAD.
Both sequences contain a zinc protease domain.

 
At which positions in the sequences are the domains located?

 

 

You can find the database entry here. See if the regions you have identified are in the Features are listed and annotated.

Now use as a matrix:Identity and note where the domains are in the two proteins.

 Confirm your analysis by checking the location of the domains with the SMART server. Transfer one of the named sequences and start by pressing the button Sequence SMART the analysis.
 exerciseDotplot_4

Compare the SERA_PLAFG sequences with yourself. Use the BLOSUM 30 matrix.

The sequence contains a region of low complexity, in this case a sequence of more than 30 serine residues.

 

In which positions do you find region (s) of low complexity?

 

  exerciseDotplot_5, introns and exons

 

In the 1st window (horizontal) select the sequence of the calmodulin gene (EMECALM) and
in the 2nd window that of the gene product CALM_EMENI.

 

Use the BLOSUM 100 matrix and one Sliding window of length 7.

Can you see the intron / exon structure?
What conditions must exist for such an analysis to be carried out?

 

 

Before the comparison, Dotplot translates the DNA sequence into the protein sequence in all three reading frames.

 

Note

The calculation and output can take some time.

In case you are familiar with the terms Intron and Exon are not familiar, please try the Internet.

 

What you should have understood by now

The 2D matrix is ​​an important data structure for the pairwise analysis of sequences. Areas with identical partial sequences are noticeable through diagonal elements, insertions or deletions through gaps in one of the sequences.