Basic protein sequence analysis

4/6/2023

There are several steps in analyzing a protein. There is no place here to mention even a single one. The majority of protein sequence analysis today uses mass spectrometry. If anybody wants to engage I will be happy to, but not now. There are millions of examples to support this view. AI just simply captures better the set of heuristic rules than other methods particularly dynamical coupling with solvent. This is far from equivalency of having a certain structure nor a possible unique binding site nor anything that remotely can be called a solution. This is what physics call "self organized criticality". Depending on conditions and their possible changes the protein suppose to perform a certain function. Why AI is so good? Because the protein folding problem is a practical not a theoretical problem. The details of this data reveal basic information such as gene and protein structures, and may lead us to major discoveries like gene-disease associations. This ill defined condition is not solvable by classic axioms but it just might be partially solvable by fuzzy methods such as AI. It means that there is a unique proportion of conditionally stable structure to conditionally unstable elements to perform given function. The new paradigm that I partially formulated is that every protein (as a matter of fact every macromolecule of life) has its own specific recipe how to combine the structural with dynamical features to accomplish the desired function (PNAS 106(2009)10505). If the old Anfinsen view was true about the minimum of global free energy there never would be life on Earth or anywhere else. The protein folding problem is an ill-defined and most likely unsolvable problem in its classic sense of axiomatic reductionist sciences. For a senior member of structural biology community it shows the yearning for practical solutions as well as naivety of the real difficulties in even defining the problem. Not all the species / sequences we include might be from completed sequenced and annotated genomes. I do not want to do an analyses where I search a database for orthologs, I want to ID it in the sequences I already have in my dataset (which were included obviously based on certain pre selected criteria). Is there any analyses where I can "plug and play" the data that I have and see whether it comes out as orthologs (hypothetically then an orthologous group for protein A and one for protein B). I am worried that some of the species included might however represent paralogs and not orthologs. And so I will use the terms polypeptide and protein interchangeably. Each separate sequence was included based on their similarity / BLAST results to the known and characterized (functionally) proteins (A and B) in Arabidopsis. And then proteins consist of one or more polypeptides. I have identified from various species protein sequences I want to include in the analyses (for both proteins). I am doing a phylogenetic analyses of two sets of proteins (A and B) that are functionally very closely related and share a "large degree" of sequence similarity.

0 Comments

Basic protein sequence analysis

Leave a Reply.

Author

Archives

Categories