Conservation Analysis
Analyzing 1568 sequences, NS3-protease HCV variability was firstly assessed by calculating the prevalence of the most common wild-type nucleotide at each position of NS3 gene. Afterwards, it was determined the impact of nucleotide variability on NS3protein, evaluating the prevalence of wild-type and mutated amino acids. Fourteen positions interested by resistance development to either linear or macrocyclic PIs were analyzed [8,10,20,31?5,41]. Resistance associated mutations (RAMs) have been divided into major and minor according to the FC in vitro .10 to at least one linear/macrocyclic PI. In particular, 19 major RAMs have been analyzed: 7 associated with both linear and macrocyclic PIs resistance (54A/S, 155K/Q/T, 156T/V), 5 exclusively related to linear compounds (36A/M, 55A, 170A/T), and 7 exclusively related to macrocyclic compound

of most relevant interaction points. With the aim to get a suitable model, these operations should be repeated using at least three different probes: a generic hydrophobic (DRY), an hydrogen bond acceptor (O) and an hydrogen bond donor (N1). In the fifth step the information obtained from the different probes are unified into a preliminary pharmacophore model. We carried out the GBPM analysis up to the fifth step of the procedure, in order to highlight the most involved residues in the recognition areas. In the GRID [45] calculations the lone pairs, the tautomeric hydrogen atoms and torsion angles, relative to the sp3 oxygen atoms and the amide atoms, have been allowed to be settled on the basis of the probe influence, while the coordinates of all the other atoms have been considered rigid (directive MOVE = 0). Default values have been used for the other parameters. In our analysis we used N1 (hydrogen-bond-donor), O (hydrogen-bond-acceptor) and DRY (hydrophobic) probes [44,45]. The component interaction analysis was performed starting from the experimental HCV protease wild-type complex (PDB 2OC8) in the following conditions: a) OPLS2005 as force field; b) GB/SA water implicit solvation model; c) dielectric constant equal to 1; d) a binding pocket defined considering protease residues ?within 12A from boceprevir (Maestro Graphics User Interface, ver. 9.8, Schrodinger, LLC). ?Because the obtained global energy minimum GRID points (Emin) were ranked in a wide range of values, graphical analysis of the GRID maps was carried out by considering, for each probe, an energy threshold (Ecut) equal to 60% of the protease-boceprevir complex Emin, as previously reported [46].

Putative Secondary RNA Structure
A full length HCV-1b genome obtained by GenBank (accession number: AJ000009) was used for RNA secondary structures prediction by using the Mfold program at 37uC, available at the UNAFold server (http://mfold.rna.albany.edu) [47]. This algorithm, based on thermodynamics of RNA structures motifs, including base-paired intramolecular stems and unpaired loops, provides the identification of putative optimal minimum free energy structures. RNA structure models and free energy values were individually predicted using the original viral HCV-1b genome AJ000009, with and without the introduction in the NS3protease coding region of specific resistance mutations at position 156: A156S, A156T, A156G, and A156V. Secondary RNA structures were individually predicted by using also CONTRAfold-software, analyzing the NS3-fragment covering amino acid from 135 to 181. This software uses probabilistic parameters learned from a set of RNA secondary structures to predict base-pair probabilities and structures using the maximum expected accuracy approach [48,49].

Structural Analysis
In order to visualize the distribution of conserved/variable NS3residues, Protein Data Bank X-ray structures 3P8N and 2OC8 (available from http://www.rcsb.org/pdb) have been considered as 3D models of HCV-1b [42] and HCV-1a [43] NS3-protease respectively, and graphically inspected by PyMOL (The PyMolMolecular-Graphics-System, ver.-1.3, Schrodinger, LLC). The ?crystallographic structures were selected considering the resolution ??of the models (3P8N 1,90 A; 2OC8 2,66 A) and excluding those crystals that showed a large number of deletions and mutations if compared to the reference sequences. The evaluation of boceprevir-protease-interactions has been performed with Maestro-GUI (Maestro-Graphics-User-Interface, ver.-9.8, Schrodinger, LLC). To highlight the most relevant ?residues for the boceprevir targets recognition, the new computational approach GRID-Based-Pharmacophore-Model (GBPM) has been applied. Such a method, useful for designing pharmacophore models starting from detailed macromolecular structures, has been described in a recent publication [44]. In particular it was developed with the aim to generate pharmacophore models useful for QSAR and virtual screening experiments by means of an unbiased computational protocol. The GRID-based pharmacophore model is created in a 6-step procedure. The first one performs the PDB file pre-treatment producing three different model structures: the complex (subunits a+b), the receptor (subunit a) and the ligand (subunit b). The second step calculates the GRID molecular interaction fields (MIF) with a certain probe onto the three targets above reported. In the third step an energy comparison of the MIFs is performed by the GRID GRAB [45] utility, generating maps with focused information on the interaction areas. The fourth step is related to the identification