Friday, September 16, 2022

Reverse engineering the T-Cell Receptor proteins for a T-Cell that can kill virus infected cells.

Today I want to continue my discussion on our fascinating adaptive immune system by considering a specific Killer-T-Cell able to bind to cells that are infected with the Human T-lymphotropic Virus (HTLV1) and I want to then show you how one would go about to reverse engineer the receptor proteins of this specific T-Cell to find out which of the gene segments in the T-Cell Receptor gene cluster was actually used to build it. 

Please start reading my previous post where I discussed how these gene segments work.

The term T-lymphotropic means it is a virus which targets T-Cells, it is also known as Human T-cell leukaemia virus type I (viralzone link) and it is a reverse transcribing virus like HIV and it sometimes causes blood cancer.

Remember how Killer/Cytotoxic T-Cells are the sidekicks of B-Cells and Natural Killer cells, and how they fill a very important role in fighting against pathogens:
  • B-Cells produce antibodies which can bind to matching antigens of pathogens (outside of cells) and disable them OR allow other cells to easily find and bind to them by providing a convenient Fc receptor that those cells (like innate immune system neutrophils and macrophages) can attach to in order to more easily kill those enemies.
  • Killer T-Cells are able to determine if a cell is infected with a known pathogen by inspecting the MHC1 proteins on the surface of cells that are continually displaying small peptide parts (small amino acid segments are called peptides) of the pathogens and checking if it can bind to their T-Cell receptors.  When they have determined that their unique receptors can bind the antigen, and they receive the required confirmatory signals that it is indeed a pathogen that they have previously been warned about, they then instruct the cells to kill themselves gracefully.
  • Natural Killer cells check which cells are not displaying any peptides in MHC1 proteins and if not, they send similar commands (as T-Cells do) to the cells to self destruct.
T-Cells will not start killing unless they have been properly trained for the job.  During development in the Thymus (where T-Cells derive their name from), they are checked to make sure:
  • That they have T-Cell receptor proteins on their surface that have been properly formed from the Alpha and Beta chains of the TRB and TRA gene segments.
  • That their T-Cell receptors are able to bind to peptides presented to them in your own body's MHC1 proteins. This is partly achieved by the CD8 protein which will bind to a part of the MHC1 protein during docking.
  • That these T-Cells will not react to the body's own proteins (if they do, they are immediately ordered to self destruct) 
  • That they do not bind too weakly or too strongly to their specific antigen, but just the correct amount (like in Goldilocks and the three bears).
When they have passed all of these tests they are allowed to leave the thymus and move to the lymph nodes where they need to be activated by professional antigen presenting cells (APC) like dendritic native immune cells.  Dendritic cells are like the spies of the immune system, collecting samples of dangerous pathogens and reporting back to headquarters (the lymph nodes) in order to identify the T-Cells with the correct receptors which are able to neutralize the enemy.  Once they find this T-Cell "operative" (out of billions of possible ones), they use a second "danger" signal to stimulate the T-Cell's CD28 protein receptor to indicate that the peptide which is currently being displayed to the T-Cell agent on its MHC1 protein is indeed part of an enemy's make-up and that this special "operative" has the correct skills for this job (i.e. the correct shuffling of its gene segments).  It will then use its CD8 surface protein to connect to a part of the MHC1 protein on the dendritic cell surface and pull it closer.  There is a very close relationship between your own CD8 and MHC1 (as well as your CD4 and MHC2 proteins in the case of Helper T-Cells).  This activates the Killer-T-Cell, which now has a license to kill the body's own cells if it again comes across a cell presenting this antigen peptide.  Once activated, they multiply and leave the lymph nodes to patrol our body in search for the pathogens they uniquely can recognise.

Footnote: MHC (which is known as the Major Histocompatibility Complex) is also known for causing organ rejection between mismatched recipients because your body will recognise somebody else's MHC as foreign and try to kill it via the Natural Killer Cells.

When cells are infected by viruses, they start emitting an alarm signal in the form of a chemical messenger called Type I interferon (IFN-alpha and IFN-beta).  This will stimulate surrounding cells to start producing more MHC1 display proteins and they will present all kinds of protein parts they have digested on the surface MHC1 receptors for T-Cells to clearly "smell".  Antibodies are generally not able to enter cells, and that is why Killer-T-Cells are so important.  When a Killer T Cell bumps into an MHC1 receptor on the surface of an infected cell, it will immediately spring into action to exercise its "license-to-kill" and get rid of the infected cell.  All of the cell's machinery, including the viral proteins will be digested and destroyed, lock-stock-and-barrel, eliminating the spread of the virus via that cell.

Short peptides from a pathogen (of around 9 amino acids in length) are carried from the inside of a cell and presented in a groove of the MHC1 protein molecule.  In the following image you can see how a part of the Human T-lymphotropic Virus protein is held very tightly in a groove of the MHC1 protein.

Different people have MHC1 proteins which are slightly better or slightly worse at presenting peptides from different pathogens, which is why some people are genetically able to better cope with different viruses than other people.  Each person's MHC1 is simply better or worse with binding to different parts of a broken down virus protein.



The following shows the secondary structure of the MHC1 groove in which the peptide amino acids are presented to the T-Cell Receptors.  The amino acids are labelled and colour coded.  You can see alpha helixes as well as beta sheets.


The specific amino acids in the TAX protein peptide (depicted in grey) are: LLFGYPVYV

Leucine-Leucine-Phenylalanine-Glycine-Tyrosine-Proline-Valine-Tyrosine-Valine

This specific sequence can be found in the genome of the HTLV1 virus at base position: 6977-7003 of the 8507 sized genome.
In the following diagrammatic 2D representation of the complete genome of the HTLV1 virus genome, I have indicated where this specific peptide (which is part of the TAX protein) can be found in the virus genome. 

I have used the Visual Genome Browser to depict the amino acids coded for by the genome bases in the TAX protein with the specific peptide that the T-Cell receptor can bind to highlighted in white.



The genome exists inside the virus particle as single stranded RNA which is reverse transcribed into DNA and then integrates into the cell DNA (sometimes causing leukaemia when it breaks an important gene during integration).  When the infected cell's ribosome protein "printing machines" are hijacked to manufacture this TAX protein, the 3 letter coding bases in the viral genome are translated into a protein of 359 amino acids.  The TAX protein is a transcription activator which will "awake" the virus that have integrated into the human DNA and cause it to be transcribed into messenger RNA, thus resurrecting the virus from its "slumber".  

But when the viral TAX protein gets digested into small peptides, the MHC1 protein will pick up pieces of the protein and "present" these pieces on the cell surface to Killer T-Cells.

In the following image you can see how snugly the TAX protein peptide "fits" like a puzzle piece to the T-Cell Receptor Alpha chain (depicted in pink) and the Beta chain (depicted in green).  The tight binding to the viral peptide is brought about by the very serendipitous arrangement of the V, D and J segments in the T-Cell Receptor alpha and beta gene clusters.  Exactly the right gene segments were included in the recombination "shuffling" during development of this T-Cell to make this all possible.  

This Killer T-Cell that has these receptors existed in the body all along, but it took an antigen presenting dendritic cell (APC) to identify the correct T-Cell for the job in the lymph node. 

The peptide is ALWAYS presented inside an MHC1 protein groove, but I have left it out in the image above to show you the close binding of the virus peptide to the T-Cell receptor CDR (Complementarity Determining Region), which is the amino acids of the T-Cell receptor proteins that is responsible for recognising and binding to the presented peptide.

Normally, it looks as follows:
The MHC1 is bound to the peptide from the bottom and the T-Cell receptors are bound at the top.  The T-Cell Alpha and Beta receptors is attached to the surface of the T-Cell and the MHC1 surface receptor is attached to the infected cell.

This is a match like a LOCK IN A KEY. It then sets in motion the "gears" inside the T-Cell that will eventually lead to the self destruction of the cell with this virus peptide on its surface.
The presence of part of the virus ism presented "on a platter" in the MHC1 groove is a tell-tale sign that the cell had been infected with the T-Cell Leukaemia virus and that it is silently making thousands of copies of the virus inside the cell.



If you want to explore the 3D structure of the above complex for yourself, you can find it here on the Protein Databank Website for the entry: 1AO7



In this same way, all kinds of proteins from our own body proteins as well as those of invading viruses, are being digested into small peptide sequences and then presented for inspection in MHC1 receptors on the surface of all of our body cells (except red blood cells) to the Killer-T-Cells for inspection.  There would normally not be Killer-T-Cells that will target normal body proteins because they would have been eliminated by the strict screening process in the thymus.  

If you are interested in more T-Cell to MHC "docking" examples, have a look at this data.  It comes from an article on the topic.

Next, I will show you how one would go about finding the specific gene segments which were stitched together to produce this specific T-Cell receptor proteins that are able to recognise this virus peptide so elegantly.

The first step is to download the actual sequences of amino acids that make up the different chains of the T-Cell Receptors. This is done by clicking on the Download-FASTA menu item:



This will provide you with a FASTA text file containing the following sequences:

>1AO7_1|Chain A|HLA-A 0201|Homo sapiens (9606)

GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWE

(This is the the main chain of the MHC1 protein which interacts with the T-Cell receptor and which also holds the peptide being presented. HLA stands for Human Leukocyte Antigen. This is also the protein that differs so much between different people making organ transplantation very difficult.)



>1AO7_2|Chain B|BETA-2 MICROGLOBULIN|Homo sapiens (9606)

MIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYCTEFTPTEKDEYACRVNHVTLSQPCIVKWDRDM

(This is part of the MHC1 protein complex, but it does not interact directly with the T-Cell receptor) . See this Wikipedia article.


>1AO7_3|Chain C|TAX PEPTIDE|Human T-lymphotropic virus 1 (11908)

LLFGYPVYV

(This is the Viral Peptide sequence)



>1AO7_4|Chain D|T CELL RECEPTOR ALPHA|Homo sapiens (9606)

KEVEQNSGPLSVPEGAIASLNCTYSDRGSQSFFWYRQYSGKSPELIMSIYSNGDKEDGRFTAQLNKASQYVSLLIRDSQPSDSATYLCAVTTDSWGKLQFGAGTQVVVTPDIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKTVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESS

(This is the T-Cell Alpha chain protein sequence)


>1AO7_5|Chain E|T CELL RECEPTOR BETA|Homo sapiens (9606)

NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASRPGLAGGRPEQYFGPGTRLTVTEDLKNVFPPEVAVFEPSEAEISHTQKATLVCLATGFYPDHVELSWWVNGKEVHSGVSTDPQPLKEQPALNDSRYALSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRAD

(This is the T-Cell Beta chain protein sequence)

The sequences highlighted in blue are the ones we want to try and reverse engineer on the human genome.


We start off by locating where the T-Cell Receptor Alpha gene is located on the human genome:

I just type in TRAV in the GENES field. (Also obtained by just pressing CTRL-G)


Selecting any of them and pressing Enter immediately jumps to Chromosome 14 and highlights where the TRAV genes can be found.


You can also filter the display to only show the required genes starting with TRA or TRB.



Now make sure your have built the local BLAST search database for chromosome 14:



Next step is to paste the T-Cell Receptor Alpha sequence into the first align search box:

KEVEQNSGPLSVPEGAIASLNCTYSDRGSQSFFWYRQYSGKSPELIMSIYSNGDKEDGRFTAQLNKASQYVSLLIRDSQPSDSATYLCAVTTDSWGKLQFGAGTQVVVTPDIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKTVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESS


This will execute the BLAST (Basic Local Alignment Search Tool) command:

tblastn.exe -task tblastn -evalue 1 -num_threads 4 -max_target_seqs 10 -outfmt "6 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore frames sseq" -db "E:\Genomes\hg38\Blast\hg38" -query "E:\Genomes\hg38\Temp_hg38\Query.fa" -out "E:\Genomes\hg38\Temp_hg38\QueryResults.txt" -seqidlist "E:\Genomes\hg38\Temp_hg38\QuerySequenceIds.txt"

It will give you the BLAST output:

Which the Visual Genome browser will then interpret the matches and provide an output where the gene segment names are presented against the highest matching entries from BLAST.


The ALPHABET letters will indicate which genes segments are most likely to be matching that part of the T-Cell Receptor sequence.


This indicates that this T-Cell Alpha sequence is highly likely made up of:

TRAV12-2  (Variable gene segment)

TRAJ24 (Joining gene segment)

TRAC (Constant gene segment)

And this is indeed the case. When I align the query sequence from the Protein Databank with the protein from the HG38 Human genome sequence I get:

The top sequence represent the query sequence while the bottom sequence represent the actual amino acids obtained from the human reference Genome HG38.

From this output on the "Comparisons" tab you can see that there is 99.02% identity/match with the query sequence for this combination of V and J and C segments.

202 amino acids match out of a total of 275.   73 are different

The join between V and J happens after amino acid 113 and the constant segment starts after 135.

We get the best match when we select:
TRAV12-2 => 1-113  (Bases=340/3  remaining bases=1)
TRAJ24 => 114-134  (Bases=63/3   remaining bases=0)
TRAC => 135-276    (Bases=425/3  remaining bases=2)

Similarity = 99.02 %   (Exact:202 + Similar:0)/Total:275 Diff:73  (202/275)


1         11        21        31        41        51        61        71

MKSLRVLLVILWLQLSWVWSQQKEVEQNSGPLSVPEGAIASLNCTYSDRGSQSFFWYRQYSGKSPELIMFIYSNGDKEDG

81        91        101       111       121       131       141       151       

RFTAQLNKASQYVSLLIRDSQPSDSATYLCAVNMTTDSWGKFQFGAGTQVVVTPDIQNPDPAVYQLRDSKSSDKSVCLFT

161       171       181       191       201       211       221       231       

DFDSQTNVSQSKDSDVYITDKTVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESSCDVKLVEKSFET

241       251       261       271        

DTNLNFQNLSVIGFRILLLKVAGFNLLMTLRLWSS

There is no Diversity (D) segment in the Alpha chain.

This provides the used bases as well as the resulting amino acids:
>chr14:22519968-22520030 (Bases=63, Codons=21) 
GTGACAACTGACAGCTGGGGGAAATTCCAGTTTGGAGCAGGGACCCAGGTTGTGGTCACCCCA

>Protein of chr14:22519968-22520030 (L=21)
VTTDSWGKFQFGAGTQVVVTP
which matches the sequence we are looking for:
MTTDSWGKFQFGAGTQVVVTP




When we go to the TRAJ24 gene segment in the genome:


Make sure the display settings is as follows:



We can search for the protein with 1 mismatch in the amino sequence by putting the MTTDSWGKFQFGAGTQVVVTP sequence in the Search box. This will search for the protein in all of the 6 reading frames.


Because we know there might be one or more amino acid not matching due to genes not having a multiple of 3 codon bases, we put 1 in the mismatches field as depicted above.  This highlights the genome sequence that matches in the genome:


You can actually use a tool to determine what amino acid sequence is coded for by demarcating it in the genome as follows: 

After using the protein coding tool, now clear the search field to reveal the newly encoded protein:




Also change the display settings as shown in order to have the browser show protein letters on the genome.

You can now observe the protein in the new reading frame: TTDSWGKFQFGAGTQVVVT


There is a tool in the Human Genome Browser that allows you to play around with selecting different gene segments to see how good a match you can get.








Now let us do the same with the T-Cell Receptor Beta chain:


>1AO7_5|Chain E|T CELL RECEPTOR BETA|Homo sapiens (9606)

NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASRPGLAGGRPEQYFGPGTRLTVTEDLKNVFPPEVAVFEPSEAEISHTQKATLVCLATGFYPDHVELSWWVNGKEVHSGVSTDPQPLKEQPALNDSRYALSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRAD


We navigate to Chromosome 7 and again use a filter:   op_gene^TRB





After again running the "Search Align" it has found :

TRBV6-5 to be the Variable segment

TRBC2 to be the constant segment

But we are not sure which makes up the Diversity (D) and Joining (J) segments.

This time we start by selecting the TRBV6-5 and TRBC2 segments which we have a more than 96% certainty of.

Then we press CTRL+select any joining segment.  This will go through all of the joining segments and then compare the resulting protein with the query sequence entered in the top text box on the Comparisons tab. It will keep the best matching one.

After doing this we find that the diversity and joining segments are likely:

TRBD1

and

TRBJ2-7



The following protein sequence is assembled from 

TRBV6-5 => 1-114      (Bases=344/3  remaining bases=2)
TRBD1 => 115-118      (Bases=12/3   remaining bases=0)
TRBJ2-7 => 119-134    (Bases=47/3   remaining bases=2)
TRBC2 => 135-314      (Bases=539/3  remaining bases=2)

1         11        21        31        41        51        61        71          

MSIGLLCCAALSLLWAGPVNAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVP

81        91        101       111       121       131       141       151  

NGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSGQGASYEQYFGPGTRLTVTEDLKNVFPPKVAVFEPSEAEISHTQK

161       171       181       191       201       211       221       231    

ATLVCLATGFYPDHVELSWWVNGKEVHSGVSTDPQPLKEQPALNDSRYCLSSRLRVSATFWQNPRNHFRCQVQFYGLSEN

241       251       261       271       281       291       301       311    

DEWTQDRAKPVTQIVSAEAWGRADCGFTSESYQQGVLSATILYEILLGKATLYAVLVSALVLMAMVKRKDSRG

We can see that bases remaining from the previous segment will still contribute to the next segment if you look at the remaining bases. This is because segments are not always a multiple of 3 bases to make full codons.

When we jump to the TRBJ2-7 gene segment we can see the Joining gene segment:



The browser will now show the amino acids that is encoded in the normal reading frame:


We want to see how this segment produces : ASYEQYFGPGTRLTVT


This will then show where the protein sequence matches on the genome in one of the 6 reading frames and we see that there is a protein which starts 2 bases earlier:


We now use the feature that will look for proteins coded on the genome:

This will generate the protein that is formed by this reading frame:

>chr7:142797454-142797501 (Bases=48, Codons=16)
TGCTCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACA

>Protein of chr7:142797454-142797501 (L=16)
SYEQYFGPGTRLTVT



When we then compare it with the protein sequence we are looking for:
ASYEQYFGPGTRLTVT  
CSYEQYFGPGTRLTVT

 (The first letter mismatches due to the bases at the join between D and J not being a multiple of 3 to make up a full codon on 3 bases)

When we follow the same procedure by going to gene segment TRBD1:


When we again put SGQGA in the search box:


>chr7:142786211-142786225 (Bases=15, Codons=5)
TGGGGACAGGGGGCC

>Protein of chr7:142786211-142786225 (L=5)
WGQGA
SGQGA


So in summary: we have now used a local BLAST search in addition to a method which constructed proteins by searching for the nest match to the reference sequence.  This allowed us to get very close matches to the Protein Databank proteins:

T-Cell Receptor Alpha:
TRAV12-2 => 1-113  
TRAJ24 => 114-134  
TRAC => 135-276    
Similarity = 99.02 %  

T-Cell Receptor Beta:
TRBV6-5 => 1-114   
TRBD1 => 115-118      
TRBJ2-7 => 119-134    
TRBC2 => 135-314     
Similarity = 96.122 % 

If you want to learn more about activation of Cytotoxic (CD8 positive or Killer)-T Cells via Dendritic cells, please go and read the following excellent article on the topic:

Activation of CD8 T Lymphocytes during Viral Infections



No comments:

Post a Comment

Please leave me a comment