3D structural prediction, analysis and validation of Sars-Cov-2 protein molecules

Knowing the structure of a protein is of enormous importance but presents great theoretical and technological challenges. The COVID-19 pandemic showed how important it was to be able to determine the structure and form of SARS-COV-2 to better un-derstand its functioning and to be able to develop vaccines and combat drugs. This article presents mathematical-computational and physical-chemical aspects involved in the reconstruction and validation of the three dimensional molecular conformation of SARS-CoV-2 virus proteins, including the variant discovered in patients from Brazil in 2021, the lineage B. 1 . 1 . 28 /P. 1. The methodology used is based on the sequencing of the virus protein through the incorporation of new in silico mutations in already known structures, the result is then submitted to computational reconstruction using an enu-merative feasibility algorithm validated by the Ramachandran diagram and alignment structural. After the structural reconstruction of the virus, a stability study is performed with the protein generated through classical molecular dynamics.


Introduction
It is known that the spatial conformation of the structure is decisive in the function performed by proteins and it is directly linked to their three-dimensional folding, constituting the main factor that governs biological interactions [1]. Protein folding is critical in many diseases, including type II diabetes, as well as neurodegenerative diseases like Alzheimer's and Parkinson's. Despite major advances in this field, it is still not possible to accurately predict the three-dimensional structures of [2] proteins.
Thus, research in this field has essential value in the discovery of new drugs, in the understanding of metabolic processes and the mechanism of various diseases. In 1968, Cyrus Levinthal realized that although there are a large number of possible conformations to a protein, its folding to the most stable structure occurs with extreme precision and very quickly. Even today, the rules by which nature carries out the conformational search remain uncertain [2].
In the year 2020, the importance of predicting the three-dimensional structure of proteins in a faster and simpler way was evidenced with the appearance of a new virus of the Coronaviridae family that spread in a few weeks around the world, giving rise to a pandemic of a magnitude not seen since the Spanish flu in 1918. Figure 1: Illustration of the cities where the most worrying variants of the novel coronavirus emerged, as well as the positions that affected the crystallographic structure in the ACE2-RBD interaction (PDB ID: 6M0J). The transmissibility, lethality and antigenicity data for the variants were obtained from [8,[10][11][12]. optimization approach is not so recurrent in these algorithms [17]. Inevitably the inclusion of structural minimization by molecular dynamics considerably increases the computational cost, but it certainly increases the probability of similarity to the real structure.
Protein energy landscapes are surfaces with many local minima. Thus, the exploration of these landscapes requires efficient sampling methods. Consequently, optimization methods are essential in this research field. Among the existing optimization methods, we have the molecular dynamics simulations where conformational sampling is described by Newton's laws as a potential energy function of the system. Finally, the native state is always the one with the lowest free energy and potential energy [17].
The three-dimensional structure can be obtained theoretically by Quantum and Classical Molecular Mechanics, experimentally by X-ray Crystallography, Nuclear Magnetic Resonance (NMR) and in a Hybrid way it is possible to use Homology Modeling from sequencing. In the X-ray method, the atomic coordinates are obtained with high accuracy from electron diffraction on a crystallized sample of the protein, allowing, from a rigorous mathematical refinement, the three-dimensional structure to be obtained [18]. The NMR method in turn is applied to structures that cannot be crystallized, but the NMR signal provides only a small subset of the distances between atoms in a molecule, because it only obtains distances between pairs of atoms that are close in the range of 5 to 6Å [19].
Regardless of the structural elucidation method, the problem is to estimate the complete structure of the molecule, determining the correct position in space of all its atoms, also known as the Molecular Distance Geometry Problem (MDGP) and traditionally formulated as a continuous optimization problem [16,19,20]. Determining the three-dimensional structure when all distances between atoms are known is a polynomial problem, but starting from an incomplete subset of distances (as in NMR data), becomes a non-polynomialhard problem [16]. The study of the problem under the Geometry of Distances approach can contribute decisively in identifying the most feasible protein structure, from a set of experimentally obtained distances.

Reconstruction of the Three-Dimensional Structure of Proteins
In this section, the main methods and tools that have been used to describe and determine the threedimensional structure of a protein are evaluated, focusing on experimental, theoretical/computational and hybrid methods.

Molecular Modeling of Complex Molecules
Complex molecules are composed of carbon bonded to other elements, especially oxygen, hydrogen and nitrogen. They are molecules similar to polymers, like proteins, due to the immense possible variety of these polymers, they are complex [21]. The term Molecular Modeling refers to theoretical methods and computational techniques to model or mimic the behavior of molecules.
Proteins are organic compounds of high molecular weight, constructed from amino acids arranged in several specific sequences and linked together through peptide bonds. They are considered the most important macromolecules in cells and make up almost 50% of their mass [22].
Analyzes of various types of proteins show that they are all formed from a standard set of twenty amino acids, called α−amino acids. The α−amino acids are formed by a central carbon (C α ) bonded to four groups: amine group (N H 2 ), carboxyl group (COOH), hydrogen and side chain (R) which is specific for each amino acid. The Figure 2 shows the general structure of the α−amino acids [23].
In Figure 2 the different regions that constitute an amino acid are presented (the planar structure was built in the academic version of MarvinSketch 20.18). The three-dimensional structure was modeled in the software GaussView 6.0.16 with a previous optimization in Gaussian 09W at DFT level with the functional hybrid B3LYP and Gaussian bases 6-311G++ (2d,2p). The α−amino acids are joined by peptide bond When the protein is being synthesized the carboxyl radical of an amino acid loses a group −OH (hydroxyl) and the amine radical of another amino acid loses a hydrogen atom −H. The amino acids then join and the OH binds to the H, yielding a water molecule as a product, peptide bond occurs between the carbon (C) of one amino acid and the nitrogen (N ) of the next amino acid, classified as a covalent bond (C − N ). [23].

Methods Used in the Structural Determination of Proteins
Methods that accurately describe the spatial conformation of proteins have used experimental and theoretical techniques. Experimentally, the three-dimensional structure can be obtained by two methods: X-ray crystallography and Nuclear Magnetic Resonance (NMR).
In the X-ray method, the atomic coordinates are obtained with great precision from the diffraction of electrons on a crystallized sample of the protein, generally in the presence of water molecules, allowing, from a rigorous mathematical refinement, to obtain the structure three-dimensional [18]. The NMR method in turn is applied to structures that cannot be crystallized. The techniques for obtaining structures are varied and rely heavily on computational aid for the construction or inference of a viable model [24,25]. In general, the atomic coordinates between sets of pairs of atoms are modeled from the magnetic resonance coupling signal of atomic nuclei, in general of hydrogen atoms [26]. The NMR signal only gives a small subset of distances between atoms in a molecule, as they only get distances between pairs of atoms that are close together in the range of 5 to 6Å [19].
The theoretical methods of building protein structures are very varied, in general there are two basic approaches, classical mechanics and quantum mechanics. In classical, only the interactions of classical force fields, such as Coulomb interactions, Van der Waals forces, among others, are applied to the atoms that make up the structure.
In quantum mechanics, the system is constructed taking into account the quantum properties of matter. Classical modeling is computationally cheap in terms of processing, in quantum the costs are prohibitive when it comes to high molecular mass proteins (which is common to the vast majority of proteins). Between these two methods there is a profusion of hybrid methods that mix experimental data and theoretical approaches, whether classical, quantum, mixed [27,28] or computationally complex [29].
In any method used to build and validate the structure of a protein, the problem will be basically the same: estimating the complete structure of the molecule, determining the correct position in space of all the atoms that compose it, also known as the Molecular Distance Geometry Problem (Molecular Distance Geometry Problem -MDGP) and traditionally formulated as a continuous optimization problem [16].

Molecular Distance Geometry Problem -MDGP
Given a molecule formed by n atoms a 1 , a 2 , . . . , a n of which a set of distances d ij between pairs of atoms a i and a j are known. The Molecular Distance Geometry Problem (MDGP) can be defined as obtaining a threedimensional configuration x 1 , x 2 , . . . , x n for the molecule respecting the known Euclidean distances [16,30].
The coordinates x 1 , x 2 , . . . , x n can be obtained from the distances d ij by solving the system of distance calculation equations: where S is the set of pairs of atoms whose distance d ij is known, where x i = (v i , w i , z i ) T a vector of coordinates with v i , w i and z i being the first, second and third coordinates of the atom i and || · || being the Euclidean norm. The structure is achieved by embedding the atoms in R 3 . Considering the position of the point as p = (v, w, z), embedding a point means finding the values of the dimensions v, w and z for the point.
The problem of determining the coordinates of the atoms of a molecule from distances between pairs of atoms can be investigated by placing the atoms in any metric space respecting the defined distances [31]. In this work, the problem will be studied only in the 3D Euclidean space. Depending on the set of distances provided and the metric space, the MDGP may have one solution, several solutions or not have a valid solution. In practice, the set of distances may contain errors, as the distances are obtained from experiments, such as NMR and crystallography, or theoretical estimates [32]. Therefore, a more practical way to define the problem is to use lower and upper limits in the set of distances, in this case the restriction of the problem is: MDGP can be classified depending on the set of distances in two forms [33,34]: • Complete set of distances -all the distances between any pairs of atoms are known, Dong and Wu presented a polynomial algorithm that solves this version of the problem [15]; • Arbitrary set of distances -only some distances among atoms in the molecule are known, this version is NP-complete for embedding in one dimension M DGP 1 and NP-hard for embedding in dimensions greater than 1 M DGP k for k > 1 [35], [16].
There are several algorithms to solve the MDGP, the vast majority of them solve the classical continuous version of the problem, such as Geometric Build-up [36]. However Lavor et al. proposed a discrete model, the Discretizable Molecular Distance Geometry Problem (DMDGP) for the arbitrary set of distances [16,29].
The Discrete Molecular Distance Geometry Problem is defined as given a weighted simple graph not directed G = (V, E, d) and an order given to the vertices v 1 , ..., v n V called backbone ordering that meets the following requirements: 1. E contains all 4-clicks of consecutive vertices: The first requirement demands that the lengths between atoms separated by three consecutive bonds are known.The second requirement forces the connection angles not to be multiples of pi. And that means that the embedding of the v atom, denoted by x v , is at the intersection of the three spheres S 3 , S 2 and S 1 centered on Figure 3. Discretization requires that all consecutive vertex 4-clicks are subgraphs of G, each 3-subclicks 1} is used to test the inequality of triangles. If the two assumptions are true then each atom will have only two possible positions. For this version of the problem (DMDGP) the main way of solving is the Branch and Prune (BP) which is an implicit enumeration method, which enumerates all possible positions of the atoms and discards invalid ones. In this work, the BP algorithm was used for the structural reconstruction of the viral protein mutations.

Branch and Prune Algorithm
The algorithm assigns an order to the atoms of the molecule, and for each atom v ∈ V that will be embedded in R 3 two assumptions are made for resolution [16,37,38]: • Valid embedding are known for all the atoms that precede v; The Branch-and-Prune algorithm shown in Algorithm 1 has five input arguments: • graph G; • vertex v that will be embedding in R 3 ; • a valid embedding x ′ for the subgraph G[U ]; • a set X of valid embedding of G already found.
The process of recursion starts with a BranchAndPrune(G, 4, {1, 2, 3}, y, ∅), where y is a valid embedding for the {1, 2, 3} atoms. The algorithm builds a binary tree in which each level v represents possible spatial positions p for the vertex v. At the end of the run, the set X contains all valid positions of G extending x ′ Pruning can be done using the Direct Distance Feasibility (DFF) method which considers the distances from v to the vertices of the subset ∈ U }, that is, vertices that have known distances to v and have not been used to determine your position. If ||x u − x v || ̸ = d u,v is not a valid embedding, so any subtree below it can be pruned, as shown in Figure 4 Figure 4: Binary tree resulting from the execution of the Branch-and-Prune algorithm. After fixing the first three base atoms in the Cartesian plane, for each atom of the protein the BP generates two possibilities of mathematically valid positions using the intersection of three spheres. To test whether the new branch generated is valid, the algorithm searches in the input instance for a known distance between the current vertex and an already embedded vertex that was not part of determining its position, so an pruning test is made using the DDF method, if the method invalidates the embedding, then the entire subtree is pruned.

Biochemical Validation by Ramachandra Graph
The Ramachandran plot describes the ϕ − ψ torsion angles of the protein backbone, providing an overview of the conformation of a protein. The ϕ − ψ angles are grouped into distinct regions on the Ramachandran graph, where each region corresponds to a specific secondary structure. There are four basic types of Ramachandran graphs, depending on the stereochemistry of the amino acid: generic (which refers to the 18 non-glycine, non-proline amino acids), glycine, proline, and pre-proline (which refers to residues preceding a proline).
The torsion angles ϕ (phi) and ψ (psi) are defined for each of the amino acid residues. These are angles that define rotation, with ϕ defining rotation around the C α − N bond of the residue, and ψ defining rotation around the C α − C bond of the same residue [39]. In principle, the dihedral angles ϕ and ψ in the amino acids can have any value between +180 • and −180 • , but several values are forbidden due to steric hindrance between the backbone atoms and the amino acid side chains, and only a few values reasonably accurately reproduce the spatial conformation of the protein, and results that prove to be forbidden or unrealistic must be discarded.
Theoretically to validate whether the conformation of proteins is the best possible, the use of the Ramachandran diagram has been found to be very useful as it tests the quality of three-dimensional structures [22]. A Ramachandran plot model where there are contour lines highlighted in green for areas allowed at dead angles, and in white as areas that cause collision is showed in Figure 5.

Methodology Proposed for the Structural Calculation of Variants
Due to the unavailability of more accurate information about the structural properties, such as the crystallographic structures that contain the mutations that make up the P.1 lineage in the state of Amazonas, the methodology shown in Figure 6 was adopted for the structural calculation of variants [41].
First, a search of the original file without mutation of the virus protein is carried out, in the RCSB protein database PDB [42]. The in silico mutation is then applied using the aid of the PyMol 2.3 software [43] with the "Mutagenesis" module, with the rotamer having the lowest steric tension (automatically from the software). In the third stage, instances of tests for the problem were generated by simulating an arbitrary set of distance, with distances less than 6 angstrom, thus resembling the restrictions present in the characterization by NMR. To strengthen the tests, instances of tests were made using only the atoms of the backbone and some using the backbone and side chain.
The structural reconstruction stage was subdivided into structural calculation and structural validation. The Branch and Prune algorithm was used for the structural calculation of virus variations, as the struc- tural prediction algorithm can generate several mathematically valid structures as a solution but does not guarantee the chemical validation of each generated structure, since it uses only as information on distances between atoms for embedding of the vertices in the plane, a chemical validation step was elaborated. For this structural validation, the Ramachandran Graph was used, which verifies the conformation of proteins (see Figures 7 and 8).
The platform adopted to generate the Ramachandran diagrams was MolProbity (https://swissmod el.expasy.org/assess) [44] with the "Evaluation Module" implemented on the SWISS-Model [45] by which it is possible to obtain the percentage of amino acids belonging to the favorable regions and with steric restrictions. The reconstructed structures were aligned with the one obtained through crystallography with the TM-Align https://zhanglab.ccmb.med.umich.edu/TM-align/) [46]. The software used for the subsequent visualization was Schrödinger Maestro 2020-4 with the module' 'Superposition " between C-α atoms.

Molecular dynamics parameters
Molecular dynamics simulations were also performed for the ACE2-RBD complex (PDB ID: 6M0J) [47] against the B.1.1.28/P.1 (E484K, K417T, N501Y) strain in the 50ns interval. All proteins were previously prepared with the software in its version for academic purposes Schrödinger Maestro 2020-4 using the "Protein Preparation" module. The generation of all input and configuration files was done using the QwikMD plugin 1.3 [48] implemented in the VMD graphical interface 1.9.4.48a [49]. It was immersed in a solvation box with cubic geometry and PBC (periodic boundary conditions) containing water molecules described by the TIP3P model, as well as the addition of N a + and Cl − ions to neutralize the system at a molar physiological concentration of 0.15mol · L −1 . Meanwhile, all topology files have been generated with the CHARMM36 [50] force field. The system was minimized using 1000 steps with a conjugate gradient approach. Shortly thereafter, there was a gradual heating of 60 − 300K under the NPT ensemble and then the system was balanced in NPT with algorithm in steepest descent over 1ns.
Finally, the trajectory calculation an NPT thermodynamic cycle with an approximate time of 50ns for ACE2-RBD. The Langevin dynamics stochastic piston was used to control pressure under 1atm analogous to a barostat, while a Nosé-Hoover thermostat was also adopted to maintain the temperature at 300.0K. Finally, the NAMD3 (Nanoscale Molecular Dynamics) algorithm [51] was used to run all simulations with acceleration of the 2 GB Nvidia GTX 1050 GPU with 640 CUDA cores in a 2f s time step. At the end, conformational changes in the macromolecule as a result of the mutations were measured through the temporal evolution RMSD and RMSF of the atomic displacement in relation to the Cα of the ACE2-RBD complex where frame 0 was adopted as a reference. The trajectory analysis was possible with the MDAnalysis [52] library.

Computational Results
In this section, the 3D structures generated from some experiments carried out with proteins from the new coronavirus will be shown. In order to carry out the tests, test instances were created that obey the same restrictions present in the characterization by NMR.
The SARS-CoV-2 structures chosen for testing in this work are shown in Table 1. All the structure possibilities of the generation tree that the BP algorithm builds were calculated and analyzed, all the generated structures respected the mathematical constraints of the problem, but some of them were considered chemically invalid when they hurt the physicochemical constraints of the proteins when the Ramachandran graph was generated. The first test performed was for the ACE2-RBD complex instance and generated two three-dimensional structures as solutions (structure X,Y,Z), which respect the mathematical distance constraints. However, when Ramachandran graphs were generated for the two solutions, it was found that the first one was totally invalidated because it generated a graph with almost all points in wrong regions. The second however, generated a Ramachandran plot that respected the valid regions, thus being an acceptable structure, as shown in Figure 7. The reference structure where mutagenesis and the respective reconstruction were applied was the ACE2-RBD (PDB ID: 6M0J). All images were constructed with the aid of the MolProbity platform.
The structure for the ACE2-RBD complex (PDB ID: 6M0J) containing the mutations of the P.1 strain, considered chemically valid showed a total of 97.06% of the amino acids in the region without steric hindrance. In addition the most consistent solution showed 2.92% of the torsion angles under marginal conditions while 0.38% under conditions of total steric hindrance. Meanwhile, the reference structure for the P1 variant of the ACE2-RBD complex (PDB ID: 7NXC) presented 95.0% of the amino acids in the region without steric hindrances. Therefore an important indication of the great consistency of the developed algorithm in this work.
The test for the antibody-antigen (PDB ID: 7BWJ) followed the pattern of the first one, they generated two 3D structures as a solution, however after validation one of them proved to be inconsistent. The reconstructed structure showed 93.97% of the amino acids in a region without steric hindrance, as shown in Figure 8. Thus, even though the algorithm is not able to remove the steric conflicts induced by mutagenesis, the structural reconstruction was satisfactorily close to the crystallographic structure. The reference structure where mutagenesis and the respective reconstruction were applied was the Spike-Antibody (PDB ID: 7BWJ). All images were constructed with the aid of the MolProbity platform.
In Figure 9 we see that the most consistent solution reconstructed by the algorithm showed an RMSD of 0.483Å for the ACE2-RBD complex containing P.1 when aligned with the PDB ID: 7NXC crystallographic structure. On the other hand, the invalid solution generated a higher RMSD of 7.49Å. We can notice that the Branch-and-Prune implementation presented the second best RMSD result from a total of 6 (six) algorithms with 0.483Å, only behind the SwissModel algorithm with 0.480Å. Nevertheless, we should note that the apparently promising results of the BP implementation may result from mere chance. This is because the insertion of mutations that constitute the P.1 variant did not reflect significant conformational changes. Therefore, as the algorithm is based on the Spike protein without mutations (PDB ID: 6M0J) it coincidentally presented promising results and very similar to the reference structure with the P.1 variant (PDB ID: 7NXC).
From the analysis of the average results of molecular dynamics (see Table 2), we noticed lower values of RMSF and lower exposure to the solvent measured by SASA and large formation of Hydrogen bonds. Therefore, in general, it can be seen that mutations of the P.1 variant have stabilized the ACE2-RBD structure, although in some central residues an increase has been noted structural flexibility according to the RMSF analysis. While only 2 (two) analyzes reflect the hypothesis of greater instability, these being: Higher radius of gyration (Rg) and lower native contacts and relatively higher RMSD values. The greater stabilization in the ACE2-RBD complex as a result of certain mutations may be an explanation of why the virus has followed a convergent evolution as already reported in some experimental studies [60][61][62][63], although the causes until then are unknown. Through the simulations of this work, we conclude that there is a tendency towards greater structural stability in the most frequent mutations. In other words, mutations that have been repeated in several strains, such as E484K and N501Y, tend to have greater thermodynamic stability. Finally, we must remember that the impacts of lineage P.1 will become clearer as we repeat the simulations or increase the time interval, which would generate more conclusive results, convergence of fluctuations and greater reproducibility. Table 2: Comparison between the P.1 variant in relation to the average values of some parameters resulting from the molecular dynamics in the range of 50ns for ACE2-RBD (PDB ID: 6M0J) that quantify structural changes in the Spike RBD region in the interaction with ACE2. All simulations were performed in NAMD3 algorithm [51]. The structural reconstruction of the P.1 variant was very consistent as the RMSD in alignment with crystallography was only 0.483Å (see Figure 9). After the structural reconstruction using BP algorithm, we performed a minimization using a conjugate gradient, the potential energy (see Table 3) was -33060.39 kcal/mol while 97.4% of the amino acids were in the most favorable region of the Ramachandran diagram. Therefore, by minimizing the structure, it became even more consistent than that obtained recently in the Protein Data Bank (PDB ID: 7NXC) with 95.0%.

Structural Reconstruction of Amazonian G196V and L84S Mutations
Another important prediction of mutations that affected the state of Amazonas was the folding of the ORF3a protein with the G196V variant as well as the L84S mutation belonging to ORF8.
When performing structural alignment on ORF3a, differences were relatively small with RMSD of 3.29Å. Regarding the ORF8 structure, there was a displacement with RMSD of 0.27SÅ compared to the variant. Although the changes may only have been due to the stochastic nature of molecular dynamics, the mutation may actually have induced subtle conformational changes seen after structural minimization.
The most likely solution for the ORF8 structure (PDB ID: 7JX6) resulted in a total of 91.07% of the residues in the most favorable region, as shown in Figure 10. The crystallographic structure showed exactly the same number of amino acids in this region, although some twist angles in the prediction are still not consistent with crystallography. The structural alignment provided an RMSD of 0Å, indicating that the algorithm could correctly reconstruct the input structure.
Regarding the ORF3a structure (PDB ID: 6XDC), the most consistent prediction showed 95.93% of the residues in the most favored region. And as in the previous tests, the structure obtained experimentally presented the same percentage in the region without steric hindrances in the Ramachandran diagram and whose RMSD of alignment was 0Å. The ramachandra graphs made for the structure are shown in Figure  10

Tests Performed with Other Virus Variants to Validate the Methodology
The other tests performed with the test instances followed the pattern of the previous ones. Each test generated two or more 3D structures mathematically valid solutions, however, after validation with Ramachandran Plot, one of them has always been inconsistent, thus being chemically invalid. A summary of the results is shown in Table 4. The Ramachandra graphs presented in Figure 11 show the comparison between the crystallographic structure referring to the PDB ID: 6WTT file and the two structures reconstructed in the test. The structure considered valid obtained a percentage of residues in the favorable region equal to that of the crystallographic structure of 94.78%, while the invalid solution presented a very low percentage of 26.33%, which invalidated the structure. For the structure of the variant PDB ID: 6XS6, the crystallographic structure presented a percentage of 93.88% of the residues in the favorable region. The most likely solution reconstructed by BP for this structure entailed a slightly lower percentage, a total of 93.83% of the residues in the favorable region, as shown in Figure 12. The structure invalidated by the Ramachandra plot generated only 24.32% of the residues in the correct region. The tests for the reference file PDB ID: 6YWK are shown in Figure 13, the solution considered chemically invalid presented a percentage of 39.37% in the valid region, while the one chemically validated by the Ramachandra graph obtained 99, 65% of waste in favorable region. The percentage of the valid structure was equal to that of the crystallographic structure, which once again showed the efficiency of the Branchand-Prune algorithm in obtaining at least one fully valid solution, very close to the real one.
The other tests performed with the file PDB ID: 6W37 followed the pattern of the previous ones, generating two 3D structures as a solution, however after validation one of them was inconsistent, as shown in Figure 14. In the structural overlap between the valid solution for the obtained prediction and the respective crystallographic structure, an RMSD tending to zero was obtained. On the other hand, the solution with the most inconsistencies in the Ramachandran diagram was precisely the one that had low alignment with an RMSD of 3.06Å.

Concluding Remarks
The methodology developed in this work proved to be effective in reconstructing proteins of the P.1 variant of the SARS-CoV-2 virus emerged in Amazonas state, Brazil, from mutated data of the wild-type proteins. Applying the BP algorithm with Ramachandran validations, we found the great consistency of structural reconstructions based on the same restrictions of experimental characterization by NMR. Thus, it is increasingly perceived that the challenge of reconstructing proteins has been solved with the great help of computer science, as the physical-chemical principles that govern folding are not yet fully understood.
An interesting point identified when testing the instances with the backbone and the side chain was that although Branch-and-Prune was designed to find the backbone structure, it was possible to reconstruct the entire tested structure, one of its solutions being valid according to mathematical and physical-chemical restrictions. However, more tests are being carried out to validate this observation. Among the future perspectives, we intend to implement a heuristic that receives the sequence of amino acids as input, and that despite the high computational complexity, can predict its tertiary structure. In addition, we intend to add an objective function based on a classic force field to create physical-chemical restrictions throughout the protein reconstruction.