Molecular docking with Gaussian Boson Sampling

Photonic quantum devices called Gaussian Boson Samplers can be programmed to predict molecular docking configurations.

Gaussian Boson Samplers are photonic quantum devices with the potential to perform tasks that are intractable for classical systems. As with other near-term quantum technologies, an outstanding challenge is to identify specific problems of practical interest where these quantum devices can prove useful. Here we show that Gaussian Boson Samplers can be used to predict molecular docking configurations: the spatial orientations that molecules assume when they bind to larger proteins. Molecular docking is a central problem for pharmaceutical drug design, where docking configurations must be predicted for large numbers of candidate molecules. We develop a vertex-weighted binding interaction graph approach, where the molecular docking problem is reduced to finding the maximum weighted clique in a graph. We show that Gaussian Boson Samplers can be programmed to sample large-weight cliques, i.e., stable docking configurations, with high probability, even in the presence of photon loss. We also describe how outputs from the device can be used to enhance the performance of classical algorithms and increase their success rate of finding the molecular binding pose. To benchmark our approach, we predict the binding mode of a small molecule ligand to the tumor necrosis factor-α converting enzyme, a target linked to immune system diseases and cancer.

I. INTRODUCTION
In his lecture "Simulating Physics with Computers" [1], Richard Feynman famously argued that classical computing techniques alone are insufficient to simulate quantum physics. Since then, significant progress has been made in formalizing this intuition by finding explicit examples of quantum systems whose classical simulation can be convincingly shown to require exponential resources. An example is Boson Sampling, first introduced by Aaronson and Arkhipov [2]. In this paradigm, identical photons interfere by passing through a network of beam-splitters and phase-shifters, and are subsequently detected at the output ports of the network. Despite the simplicity of this model, it has been shown that, under standard complexity-theoretic conjectures, generating samples from the output photon distribution requires exponential time on a classical computer [2][3][4]. Several variants of boson sampling have been proposed that aim at decreasing the technical challenges with its experimental implementation [5][6][7][8][9][10][11][12].
Most efforts in the study of Boson Sampling have been focused on its viability to disprove the Extended Church-Turing thesis [13]; not on its potential practical applications. Nevertheless, it is possible to ask: if Boson Sampling devices are powerful enough that they cannot be simulated with conventional computers, is there a way of programming them to perform a useful task? In fact, practical applications of Boson Sampling have already been reported. In Ref. [14], it was shown that a Boson Sampling device can be used to efficiently estimate the vibronic spectra of molecules, a problem for which in general no efficient algorithm is known. Proof-of-principle demonstrations have also been reported [15,16]. Additionally, Refs. [17][18][19] discuss how a specific model known as Gaussian Boson Sampling (GBS) can be employed in combinatorial optimization problems concerned with identifying large clusters of data. Molecular docking is a computational method for predicting the optimal interaction of two molecules, typically a small molecule ligand and a target receptor. This method works by searching the configurational space of the two molecules and scoring each pose using a potential energy function. Using molecular structures to determine stable ligand-receptor complexes is a central problem in pharmaceutical drug design [20][21][22][23][24]. Several techniques for finding stable ligand-receptor configurations have been developed, including shape-complementarity methods [25][26][27][28][29][30] and molecular simulation of the ligandreceptor interactions [31,32], which vary in their computational requirements. For high-throughput virtual screening of large chemical libraries, it is desirable to search and score ligand-receptor configurations using as few computational resources as possible [33]. Motivated by these computational problems, several recent efforts have focused on practical applications of near-term quantum computers in the life sciences [34][35][36][37][38][39][40][41][42].
In this work, we show how GBS can be used to solve the molecular docking problem. We extend the binding interaction graph approach, where the problem of identifying docking configurations can be reduced to finding large clusters in weighted graphs [43,44]. We then show how GBS devices can be programmed to sample from distributions that assign large probabilities to these clusters, thus helping in their identification. Docking configurations can be obtained by direct sampling or by hybrid algorithms where the GBS outputs are post-processed using classical techniques. We apply our method through numerical simulations to find molecular docking configurations for a known ligand-receptor interaction [45]. Several therapeutic agents targeting this protein have entered into clinical trials for both cancer and inflammatory diseases [46].

II. BACKGROUND
Before presenting our results we provide relevant background information on Gaussian Boson Sampling, graph theory, and molecular docking.

A. Gaussian Boson Sampling
Quantum systems such as the quantum harmonic oscillator or the quantized electromagnetic field can be described by phase-space methods. Here, each state is uniquely determined by a quasi-probability distribution such as the Wigner function W (x, p) over its position x and momentum p variables [47]. A quantum state is called Gaussian if its Wigner function is Gaussian [48]. Any multi-mode Gaussian state ρ is parametrized by its first and second moments, namely the displacement α j = Tr[ρξ j ] and the covariance matrix σ with entries σ jk = Tr[{ξ j ,ξ k }]/2, whereξ j is a vector of creation and annihilation operators: calling M the number of modes, ξ j =â j = (x j +ip j )/ √ 2 andξ M +j =â † j for j = 1, . . . , M . Gaussian quantum states are ubiquitous in quantum optics, and have enabled detailed theoretical modeling and coherent manipulations in experiments [48,49].
In spite of their infinite-dimensional Hilbert space, Gaussian states can be simulated efficiently, as their evolution can be modeled by linear transformations such as Bogoliubov rotations [50]. However, when non-Gaussian measurements are employed, e.g., via photon-counting detectors [5,8] or threshold detectors [51], modelling measurement outcomes becomes extremely challenging even for supercomputers. Indeed, it has been shown that under standard complexity assumptions, sampling from the resulting probability distribution cannot be done in polynomial time using classical resources [2,8,11].
For a Gaussian state with zero displacement and covariance matrix σ, the Gaussian Boson Sampling (GBS) distribution obtained by measuring the state with photon-counting detectors is given by [8]: The set S = (n 1 , . . . , n M ) defines a measurement outcome, where n j is the number of photons in mode j, and the submatrix A S is obtained by selecting rows and columns of A, as described in Ref. [8]. The function Haf(A S ) is the Hafnian of A S , a matrix function which is #P-Hard to approximate for worst-case instances [52][53][54]. For a 2N × 2N matrix A, it is defined as where PMP is the set of perfect matching permutations, namely the possible ways of partitioning the set 1, . . . , 2N into subsets of size 2. When threshold detectors are employed [51], the output is a binary variable s j for each mode: s j = 1 corresponds to a "click" from the jth detector that occurs whenever n j > 0; on the other hand, s j = 0 for n j = 0. The probability distribution with threshold detectors can be obtained by summing infinitely many probabilities from Eq. (1) or via closedform expressions that require evaluating an exponential number of matrix determinants [51].

B. GBS to find dense subgraphs
When A is the adjacency matrix of an unweighted graph G, the Hafnian of A is equal to the number of perfect matchings in G. Using mathematical properties of the Hafnian, it was shown in Ref. [19] that a GBS device can be programmed to sample from a distribu- The parameter c depends on the spectral properties of A and can be tuned to lower the probability of observing photon collisions, i.e., n j ≥ 2 for some j. More details are provided in Appendix A. In the collision-free subspace, A S is the adjacency matrix of the subgraph specified by the vertices j for which n j = 1, and Haf(A S ) is equal to the number of perfect matchings in this subgraph. Therefore, a GBS device can be programmed to sample large-Hafnian subgraphs with high probability.
The density of a graph G is defined as the number of edges in G divided by the number of edges of the complete graph. Intuitively, a subgraph with a high number of perfect matchings should have a large density; a connection that was made rigorous in Ref. [55]. This fact was used in Ref. [17] to show that GBS devices can be programmed to sample dense subgraphs with high probability. Hybrid quantum-classical optimization algorithms can be built by combining GBS random sampling with stochastic optimization algorithms for dense subgraph identification.

C. Molecular docking
Molecular docking is a computational tool for rational structure-based drug discovery. Docking algorithms predict non-covalent interactions between a drug molecule (ligand) and a target macromolecule (receptor) starting from unbound three-dimensional structures of both components. The output of such algorithms are predicted three-dimensional orientations of the ligand with respect to the receptor binding site and the respective score for each orientation. Reliable determination of the most probable ligand orientation, and its ranking within a series of compounds, requires accurate scoring functions and efficient search algorithms [56]. The scoring function contains a collection of physical or empirical parameters that are sufficient to score binding orientation and interactions in agreement with experimentally determined data on active and inactive ligands. The search algorithm describes an optimization approach that can be used to obtain the minimum of a scoring function, typically by scanning across translational and rotational degrees of freedom of the ligand in the chemical environment of the receptor. In the simplest case, both the ligand and the receptor can be approximated as rigid bodies, but more accurate methods can account for inherent flexibility of the ligand and receptor [30]. As is the case for most molecular modelling approaches, a trade-off exists between accuracy and speed.
High-performance algorithms enable molecular docking to be used for screening large compound libraries against one or more protein targets. Molecular docking and structure-based virtual screening are routinely used in pharmaceutical research and development [57]. However, evaluating billions of compounds requires accurate and computationally efficient algorithms for binding pose prediction. Widely used approaches for molecular docking employ heuristic search methods (simulated annealing [58] and evolutionary algorithms [59]) and deterministic methods [60]. In one combinatorial formulation of the binding problem utilized in the DOCK 4.0 and FLOG [61,62], an isomorphous subgraph matching method is utilized to generate ligand orientations in the binding site [43,44,63]. In this formulation of the binding problem, both the ligand and the binding site of the receptor are represented as complete graphs. The vertices of these graphs are points that define molecular geometry and edges capture the Euclidean distance between these points. In order to strike a balance between the expressiveness of the graph and its size, we reduce the all-atom molecular models of the ligand and receptor to a pharmacophore representation [64,65].
A pharmacophore is a set of points which have a large influence on the molecule's pharmacological and biological interactions. These points may define a common subset of features, such as charged chemical groups or hydrophobic regions, that may be shared across a larger group of active compounds. For the purposes of this study, we define six different types of pharmacophore points: negative/positive charge, hydrogen-bond donor/acceptor, hydrophobe, and aromatic ring. In the graph representation, the type of the pharmacophore point is preserved as a label associated with its vertex. Hence we refer to this molecular graph representation as a labeled distance graph (see also Appendix B). As illustrated in Fig. 1, a labeled distance graph is constructed as follows for both the ligand and receptor: 1. Heuristically identify pharmacophore points likely to be involved in the binding interaction. These form the vertices of the graph.
2. Add an edge between every pair of vertices and set its weight to the Euclidean distance between the pharmacophore points they represent. 3. Assign a label to every vertex according to the respective type of pharmacophore point it represents.

III. GBS FOR MOLECULAR DOCKING
A. Mapping molecular docking to maximum weighted clique The labeled distance graphs described in Section II C capture the geometric three-dimensional shapes and the molecular features of both the protein binding site and the ligand that interacts with it. In this section, akin to [43], we combine these two graphs into a single binding interaction graph. Subsequently, we reduce the molecular docking problem to the problem of finding the maximum weighted clique. Panel A depicts the inputs for the construction of the binding interaction graph -two labeled graphs (one for the ligand and one for the receptor) and corresponding contact potential that captures the interaction strength between different types of vertex labels. We denote vertices on the ligand and receptor with upper and lower case letters respectively. The binding interaction graph is constructed (Panel B) by creating a vertex for each possible contact between ligand and the receptor, weighted by the contact potential. Pairs of vertices that represent compatible contacts (see Panel C for various scenarios) are connected by an edge. The resulting graph is then used to search for potential binding poses (Panel D). These are represented as complete subgraphs -also called cliques -of the graph, as they form a set of pairwise compatible contacts. The heaviest vertex-weighted cliques represent the most likely binding poses (maximum vertex-weighted clique depicted in red).
If two pharmacophore points are interacting, they form a contact. A binding pose can be defined by a set of three or more contacts that are not colinear. We model contacts as pairs of interacting vertices of the labeled distance graphs of the ligand and the binding site. Consider the labeled distance graph G L of the ligand and the labeled distance graph G B of the binding site, with their vertex sets V L and V B respectively. A contact is then represented by a singe vertex c i ∈ V L × V B . The set of possible contacts forms the vertices of the binding interaction graph. In principle, any pharmacophore point of the ligand could be interacting with any pharmacophore point of the binding site, and therefore we have to consider every possible pair of corresponding interacting vertices. Hence the number of vertices of the binding interaction graph is nm, where n is the number of vertices of the labeled distance graph G L and m is the number of vertices of the labeled distance graph G B .
The goal of the binding interaction graph is to model possible binding poses via sets of contacts. However, not every combination of contacts is physically realizable. Two contacts are not be compatible if their mutual realization would violate the geometrical shapes of the ligand and the binding site. To model this, the binding interaction graph contains an edge between two contacts if and only if they are compatible. As a result, a pairwise compatible set of contacts, i.e., such as would arise from a true binding pose, forms a complete subgraph of the binding interaction graph. A complete subgraph, also called a clique, in a graph G is a subgraph where all possible pairs of vertices are connected by an edge.
The compatibility of contacts is captured by the notion of τ flexibility, which is illustrated in Fig. 2 (see also Appendix B). Even though both the ligand and the binding site can exhibit a certain amount of flexibility, in general, geometric distances between two contacts have to be approximately the same both on the ligand and the binding site. Two contacts (v l1 , v b1 ) and (v l2 , v b2 ) form a τ flexible contact pair if the distance between the pharmacophore points on the ligand (points corresponding to vertices v l1 and v l2 ) and the distance between the pharmacophore points on the binding site (points corresponding to vertices v b1 and v b2 ) does not differ by more than τ +2 (see Panel C in Fig. 2). The constants τ and describe the flexibility constant and interaction distance respectively.
In order to model varying interaction strengths between different types of pharmacophore points, we associate a different weight to every vertex of the binding interaction graph. The weights are derived using the pharmacophore labels that are captured in the labeled distance graphs of the ligand and the binding site. Given a set of labels L, a potential function κ : L×L → R is applied to compute the weights of the individual vertices. This allows us to bias the algorithm towards stronger intermolecular interactions. Potential functions can be derived in several ways, ranging from pure data-based approaches such as statistical or knowledge-based potentials [66][67][68] to quantum-mechanical potentials [21]. Details of the potential used in this study are described in Section IV.
Hence under the model derived in this study, the most likely binding poses correspond to vertex-heaviest cliques in the binding interaction graph. The problem of finding a maximum weighted clique is a generalization of the maximum clique problem of finding the clique with the maximum number of vertices. When G has n vertices, the number of possible subgraphs is O(2 n ), so a brute force approach becomes rapidly infeasible for growing values of n. The max-clique decision problem is NP-hard [69]: as such, unless P=NP, in the worst case any exact algorithm run for superpolynomial time before finding the solution. There are deterministic and stochastic classical algorithms for finding both the maximum cliques and maximum weighted cliques, or for finding good approximations when n is large [70].

B. Max weighted clique from GBS
In this section, we show that a GBS device can be programmed to sample from a distribution that outputs the max-weighted clique with high probability. The main technical challenge is to program a GBS device to sample, with high probability, subgraphs with a large total weight that are as close as possible to a clique. Consider the graph Laplacian L = D−A, where D is the degree matrix and A the adjacency matrix. The normalized Laplacian [71]L = D −1/2 LD −1/2 is positive semidefinite and its spectrum is contained in [0, 2]. More generally, we define a rescaled matrix where Ω is a suitable diagonal matrix. If the largest entry of Ω is bounded as shown in Appendix A, then the spectrum of B is contained in [0, c], where c ≤ 1 can be tuned depending on the maximum amount of squeezing obtainable experimentally. Using the decoupling theorem from Appendix A, we find that a GBS device can be programmed to sample from the distribution where we consider outputs S = (n 1 , . . . , n M ) with n j ≤ 1 and N = j n j total photons. In the collision-free subspace, the dependence on the diagonal matrix D disappears so we may focus on programming GBS with a rescaled adjacency matrix ΩAΩ. From a GBS sample S, we construct the subgraph H of G made by vertices j with n j = 1. The matrix A S is the N × N adjacency matrix of H. The Hafnian of an adjacency matrix is maximum for the complete graph, namely when H is a clique. Therefore, for a fixed total number of photons N , the Hafnian term maximizes the probability of detecting photon configurations that correspond to a clique. Different choices are possible for the weighting matrix Ω. For an unweighted graph, convenient choices are either a constant Ω or Ω ∝ D. In the former case, det Ω S = c N for c < 1, so the parameter c can be tuned via squeezing in order to penalize larger N , i.e., larger subgraphs (see Appendix A 3). In the latter case, det Ω = c N det D is proportional to the Narumi-Katayama index [72], which describes some topological properties of the graph. Similarly to the Hafnian, it is maximum when H is a clique.
For a vertex-weighted graph, we can use the freedom of choosing Ω to favour subgraphs with larger total weight.
There are multiple ways of introducing the weights w j in Ω and a convenient choice is where c is a normalization to ensure the correct spectral properties and α > 0 is a constant. When α is small, the determinant term det Ω S ≈ 1 + α j:nj =1 w j is large when the subgraph H has a large total weight. This is useful for the max-weighted clique problem as it introduces a useful bias in the GBS probability of Eq. (4) that favours heavier subgraphs. However, if α is too large, the Hafnian term in Eq. (4) becomes less important and GBS will sample heavy subgraphs that typically do not contain cliques. To prevent this occurrence, the parameter α must be chosen carefully. Ideally, the weights should give a positive bias to heavy cliques, but should not favour heavy subgraphs that are not cliques. More details are discussed in Appendix A.
C. Hybrid algorithms GBS devices can in principle have a very high sampling rate -primarily limited by detector dead timeso just by observing the photon distribution it is possible to extract the maximum weighted clique for small enough graphs. We call this simple strategy GBS random search -see Fig. 3 for a graphical explanation of the method. However, selecting photon outcomes that correspond only to cliques means wasting samples that are potentially close to the solution. Indeed, an optimally programmed GBS device will sample from both the correct solution and neighboring configurations with high probability. Therefore, we propose two algorithms to post-process all GBS data which incur an overhead in run time but are especially useful for finding cliques in larger graphs.
Greedy Shrinking: Starting from an output subgraph H from GBS, vertices are removed based on a local rule until a clique is found -see Fig. 3 for a graphical explanation of the method. Removal is based on vertex degree and weight. Vertices with small degree are unlikely to be part of a clique making them good candidates to be discarded. The role of the weights is less straightforward: vertices with low weight may not be part of the maxweighted clique, but this assumption may be incorrect if the clique is made by a heavy core together with a few light vertices. Because of this, vertex degree is prioritized over vertex weight during the greedy shrinking stage. More precisely, the algorithm proceeds as follows: 1. From a GBS outcome, build a subgraph H with vertices corresponding to the detectors that "click".
2. If H is a clique, return H. Expansion with Local Search: GBS provides high-rate samples from max-cliques, and greedy shrinking enhances the probability of finding a solution via classical postprocessing of sampled configurations. We may increase the probability of finding the solution even further, at the cost of a few more classical steps. This is done by employing a local search algorithm that tries to expand the clique with neighbouring vertices, as shown also in Fig. 3. Algorithms such as Dynamic Local Search (DLS) [73] and Phased Local Search (PLS) [74] are among the best-performing classical algorithms for max-clique [70]. These algorithms usually start with a candidate clique formed by a single random vertex, and then try to expand the clique size and replace some of its vertices by locally exploring the neighbourhood. More precisely, the following iteration is repeated until a sufficiently good solution is found, or the maximal number of steps is reached: 1. Grow stage: Starting from a given clique, generate the set of vertices that are connected to all vertices in the clique. If this set is non-empty, select one vertex at random, possibly with large weight, and add it to the clique.
2. Swap stage: If the above set is empty, generate the set of vertices that are connected to all vertices in the clique except one (say v). From this new set, select a vertex at random and swap it with v. This gives a new clique of the same size but with different vertices, thus constituting a local change to the clique. For max-weighted clique, the swapping rule also considers vertex weight.
An important aspect of the above local search is that, at each iteration step, the candidate solution is always a clique and the algorithm tries to expand it as much as possible. GBS can be included in this strategy in view of its ability to provide a starting configuration that is not a mere random vertex. Indeed, a GBS output after greedy shrinking is always a clique, with a comparatively large probability of being close to the maximum clique.
In case the candidate output from greedy shrinking is not the maximum clique, then it can be expanded with a few iterations of local search. Since the cliques sampled from a carefully programmed GBS device are, with high probability, larger than just a random vertex, the number of classical expansion steps is expected to be significantly reduced. This will be demonstrated with relevant numerical examples in the following section.

IV. NUMERICAL RESULTS
We study the binding interaction between the tumor necrosis factor-α converting enzyme (TACE) and a thiolcontaining aryl sulfonamide compound (AS). TACE was chosen due to the planar geometry of the active site cleft and its high relevance to the pharmaceutical industry. Due to its role in the release of membrane-anchored cytokines like the tumor necrosis factor-α, it is a promising drug target for the treatment of certain types of cancer, Crohn's disease and rheumatoid arthritis [75][76][77]. The ligand under consideration is part of a series of thiol-containing aryl sulfonamides which exhibit potent inhibition of TACE, and is supported by a crystallographic structure [45]. This complex provides an important testbed to benchmark our GBS-enhanced method. As we will show, our method is able to find the correct binding pose without requiring all-atom representation or simulation of the ligand/receptor complex.
The binding interaction graph for the TACE-AS complex is constructed by first extracting all the pharmacophore points on ligand and receptor using the software package rdkit [78]. To simplify numerical simulations, we identity the relavant pairs of pharmacophore points on the ligand and receptor that are within a distance of 4Å of each other, and whose label pairs are either hydrogen donor/acceptor, hydrophobe/hydrophobe, negative/positive charge, aromatic/aromatic. After this procedure, we get 4 points on the ligand and 6 points on the receptor and create two labelled distance graphs as illustrated in Fig. 1. The knowledge-based potential is derived by combining information from PDBbind [79][80][81], a curated dataset of protein-ligand interactions, and the Drugscore potential [82][83][84]. More details are presented in Appendix C, where the resulting knowledge-based potential is shown in Table S1.
Using this knowledge-based potential, we combine the two labelled distance graphs into the TACE-AS binding interaction graph as shown in Fig. 2. A summary of our graph-based molecular docking approach is shown in Fig. 4, which includes a molecular rendering of the predicted binding interactions of the AS ligand in the TACE binding site using the crystallographic structure of this complex (PDB: 2OI0) ??. These interactions correspond to the maximum vertex-weighted clique in the TACE-AS graph. This set of pharmacophore interactions can be used as constraints in a subsequent round of molecular docking to deduce three-dimensional structures of the ligand-receptor complex ??. We now study the search of the maximum weighted clique on the TACE-AS graph via a hierarchy of algorithms in increasing order of sophistication. As discussed previously, these are: 1. Random search: Generate subgraphs at random and pick the cliques with the largest weight among the outputs.
2. Greedy shrinking: Generate a large random subgraph and remove vertices until a clique is obtained. Vertices are removed by taking into account both their degree and their weight.
3. Shrinking + local search: Use the output of the greedy shrinking algorithm as the input to a local search algorithm.
These form a hierarchy in the sense that random search is a subroutine of greedy shrinking, which is itself a subroutine of shrinking + local search. For each of these algorithms we compare the performance of standard classical strategies with their quantum-classical hybrid versions introduced in Sec. III C, where the random subgraph is sampled via GBS. For a fair comparison with GBS-based approaches, the classical data is generated as follows: we first sample a subgraph size N from a normal distribution with the same mean N and variance ∆N 2 as the GBS distribution, then uniformly generate a random subgraph with size N . We begin our analysis with a pure GBS random search. We consider GBS with threshold detectors, which register measurement outcomes as either 'no-click' (absence of photons) or 'click' (presence of one or more photons). The correct ligand-receptor superposition corresponding to the maximum weighted clique in the TACE-AS graph is shown on the right. Panel C visualizes the crystallographic structure of the TACE-AS complex with optimal ligand-receptor interactions correctly predicted by the maximum weighted clique. We omit the metal cofactor in the enzyme active site for visual clarity, as it was not considered as a pharmacophore point under our procedure.
We employ either a brute force approach to calculate the resulting probability distribution or, when that becomes infeasible, the exact sampling algorithm discussed in Refs. [51,85]. Given the complexity of simulating GBS with classical computers, for simplicity in numerical benchmarking, we first consider the simpler case where the maximum clique size is known, so we can post-select GBS data to have a fixed number of detection clicks. This drastically simplifies numerical simulations (see Appendix A 4 for details), at the expense of disregarding data that would otherwise be present in an experimental setting.
For the TACE-AS binding interaction graph, the largest and heaviest cliques both have eight vertices, so we fix N = 8. There are a total of 19 cliques of this size in the graph (see also Fig. S1 in the Appendix D). In Fig. 5 we show the outcomes of a numerical experiment where a GBS device has been programmed to sample from the Hafnian of ΩAΩ, with Ω as in Eq. (5). For simplicity, we choose α = 1 in Eq. (5), although performance can be slightly improved with optimized values of α. On the other hand, the parameter c does not play any role in the post-selected data, but it does change the overall probability of getting samples of size N = 8. For comparison, we have also studied a purely classical random search, where each data is a uniform random subgraph with N vertices. We observe only three cliques over 10 5 samples. On the other hand, as shown in Fig. 5, GBS is able to produce roughly 300 cliques directly from sampling, without any classical post-processing. This indicates that the GBS distribution is indeed favouring cliques with large weights, as intended.
Post-selecting on the number of detector clicks is an unwise strategy when employing real GBS devices because it disregards otherwise useful samples. Moreover, the size of the maximum weighted clique is generally unknown. Instead, we can generate cliques from every sample by employing the shrinking strategy discussed in Section III C.
In Fig. 6 we study the performance of greedy shrinking with GBS data. These data consist of 10 4 samples obtained from an exact numerical sampling algorithm [85]. Each sample corresponds to a subgraph and, un- like Fig. 5, here any subgraph size is considered. These results show that with GBS and greedy shrinking -a simple classical post-processing heuristic -it is possible to obtain the maximum weighted clique with sufficiently high probability. Indeed, the histogram in Fig. 6 has a sharp peak corresponding to the clique of maximum size N = 8 and maximum weight ≈ 3.99. The success rate in sampling from the max weighted clique is ≈ 12% and the overall sampling rate for N = 8 cliques is ≈ 19%. Greedy Shrinking with purely classical random data is shown in the Supplementary Fig. S2. Although the classical distribution is chosen to have the same mean and variance as the GBS distribution, its performance is considerably worse: the maximum weighted clique is obtained only 1% of the time, compared to 12% for GBS. This shows that GBS with greedy shrinking is already able to find the maximum weight clique of the graph after only a few repetitions.
Finally, we study how the cliques obtained from GBS with greedy shrinking can be enlarged or improved via local search. Fig. 7 shows the performance of the hybrid GBS shrinking + local search algorithm, compared to a classical strategy. The results indicate that GBS not only provides better initial estimates after greedy shrinking (zero iteration steps), but it maintains a significant margin compared to classical strategies as the number of steps is increased. After k = 8 local expansion steps, the probability of finding the maximum weighted clique is as high as 60%, while the classical strategy has a considerably smaller success rate of < 30%. After many steps, the success rate saturates: using GBS the success rate gets close to 70%, while for the purely classical approach it remains under approximately 35%. The role of noise and squeezing is discussed in Appendix D, where we show that GBS success rate is not diminished by the effect of noise, provided that the amount of squeezing is increased accordingly. Therefore, GBS shrinking and its variant with local search are robust against noise, maintaining a significant margin compared to purely classical strategies.

V. CONCLUSIONS
We have shown that Gaussian Boson Sampling (GBS) can be employed to predict accurate molecular docking configurations, a central problem in pharmaceutical research and development. This is achieved by first mapping the docking problem to the task of finding large cliques in a vertex-weighted graph, then programming the GBS device to sample these cliques with high prob-ability. This constitutes an example of the viability of near-term quantum photonic devices to tackle problems of practical interest.
Established algorithms for obtaining molecular docking configurations exist, but a challenge arises in the context of industrial drug design where large numbers of candidate molecules must be screened against a drug target. In this case, a fast method for predicting docking configurations is required. In principle, photonic devices such as Gaussian Boson Samplers can operate at very high rates, and may potentially provide solutions in shorter timeframes. Additionally, by sampling better random subgraphs, GBS serves as a technique to enhance the performance of classical algorithms because it increases the success rate of identifying large weighted cliques. This property is relevant and applicable in any context where identifying clusters in graphs is important, beyond applications in molecular docking.
More broadly, our results establish a connection between seemingly disparate physical systems: the statistical properties of photons interacting in a linear-optical network can encode information about the spatial configuration of molecules when they combine to form larger complexes. In other words, we have found that when the interaction between fundamental particles is carefully engineered, they acquire collective properties that can be probed to perform useful tasks. A complete understanding of the capabilities of emerging quantum technologies may thus require further exploration of systems that, even if incapable of universal quantum computation, can still be programmed to exhibit properties that can be harnessed for practical applications. squeezing parameter r. Being pure, the A matrix is written as A = B ⊕ B * and, for a single mode, B = tanh(r). For maximum squeezing r max we find that B can take any value in [0, c] with c = tanh(r max ). The resulting average photon number is then N = sinh(r) 2 = c 2 1−c 2 and the variance is ∆N 2 ∝ N (1 + N ). For multiple modes the expressions are similar, though B is a matrix and N = Tr[ B 2 I−B 2 ], so the normalization factor can be tuned to provide a higher rate to subgraphs of different sizes N . Although the maximum clique size is not known a priori, an estimate, e.g. based on random graphs [86], is normally enough as the large variance ∆N 2 ≈ N (1 + N ) assures that different sizes are sampled with sufficiently high rate.
Gaussian boson sampling using click detectors yields a discrete probability distribution over subsets S N of {1, . . . , M } of dimension N . We write i ∈ S N if the ith detector "clicks" and i / ∈ S N otherwise. The resulting probability distribution is [51] p(S N ) = Tr where P i 0 = |0 i 0 i | is the projection into the zero photon state and P i 1 = 1 1 − P i 0 . The average number of clicks N is then where ρ j is the reduced state on mode j. Using the fidelity formula for Gaussian states [87] we then get where σ j is the reduced (2 × 2) covariance matrix for mode j. The above equation can be solved to bias the number of clicks. When the covariance matrix σ depends on the normalization factor c, we can use a simple line search algorithms to tune c such that N [σ(c)] is equal to the desired value. . Each probability requires the evaluation of O(2 N ) determinants, so the complexity is still exponential as a function of N [51]. However, focusing on postselection with a certain size N reduces the complexity of brute force approaches from exponential to polynomial, although the degree of this polynomial increases with N .

5.
Selecting parameter α For a complete graph with 2n vertices the Hafnian is h n = 2n! n!2 n . The largest Hafnian for non-complete graphs is obtained by removing an edge from the complete graph. The Hafnian is then 2n−2 2n−1 h n , so this non-optimal graph is penalized by a factor 2n−2 2n−1 1− 1 2n . A possible choice for α is to avoid a counterbalance of this term, so 1+αw tot < 2n−1 2n−2 . Nonetheless, we have numerically observed that, at least for sparse graphs, the parameter α does not have to be carefully chosen, and different values of α provide the expected enhancement for the max weighted clique problem.

Appendix B: Graph representations of molecular interactions
In this section, we use L to denote the set of all labels corresponding to the individual pharmacophore point types and κ to denote the potential function κ : L × L → R that assigns an interaction strength to each pair of labels from L.

Labeled distance graph
Definition B.1. Labeled distance graph. Let S be a set of points in three dimensional space S = (x i , y i , z i ) | i ∈ I for a given index set I heuristically selecting pharmacophore points of a component (either ligand or the binding site) involved in the binding complex. Then labeled distance graph G S is defined as G S = (V S , E S , ω S , α S ) where is the set of vertices, is the set of edges, is the weighting function of the edges and is a function assigning a pharmacophore point type to each vertex.
Remark. Any labeled distance-graph is a complete graph with I vertices.

Binding interaction graph
Let G L = (V L , E L , ω L , α L ) be a ligand labeled distance graph and G B = (V B , E B , ω B , α B ) labeled distance graph for the binding site. Any pair of vertices where τ is the flexibility constant and is the interaction cutoff distance.
Remark. Mutual τ flexibility of contact pairs is a reflexive and symmetric relation, but not necessarily transitive.
For multiple contact pairs to be realized in the binding pose they have to not violate each other's geometric constraints and hence be pairwise τ flexible . In the following graph representation, this corresponds to a clique: Definition B.3. Binding interaction graph. Let G L = (V L , E L , ω L , α L ) be a labeled distance-graph for a given ligand and G B = (V B , E B , ω B , α B ) a labeled distancegraph for a given binding site. The corresponding binding interaction graph I L,B is defined as where vertex set V is the set of the pairs over vertex-sets of G L and G B and τ, ∈ R + are the flexibility threshold constant and interaction cutoff distance. Then (B7) is a maximal set of τ flexible contact pairs between G L and G B and Ω : V → R is an vertex-weighting function defined as which encodes the interaction strength between pharmacophore points corresponding to vertices v l and v b .
Remark. Most favourable binding pose of ligand described by labeled distance graph G L and binding site described by labeled distance graph G B corresponds to the heaviest vertex-weighted clique of binding interaction graph I L,B .  TABLE S1. Knowledge-based pharmacophore potential. Data is derived from the PDBbind dataset from 2015 [79][80][81].
The matrix is lower-diagonal since any potential function is symmetric.
FIG. S1. Position of all the maximum cliques. We focus on the TACE-AS graph, where cliques are shown with orange notes, darker edges. The diameter of each vertex is proportional to its weight. The cliques are ordered from low to high total weight, starting from the top-left until the bottom-right order. The heaviest clique is shown in red in the last graph.
two main clusters in the graph: the top right cluster, generally with light weights, and the bottom left cluster with heavy weights. There are also a couple of intermediate cliques where these two clusters are mixed. The maximum weighted clique is shown in the bottom graph, where from node diameter we observe that it is composed by a heavy six-vertex core and two light vertices. Comparison with Fig. 5 shows that all lightweight cliques have a low occurrence rate in a carefully programmed GBS device.
In Fig. S2 we show the output of Greedy Shrinking with purely classical random data. For a fair comparison with the GBS-based approach shown in Fig. 6, the classical data are generated as follows: we first sample a subgraph size N from a normal distribution with the same mean N and variance ∆N 2 as the GBS distribution, then uniformly generate a random subgraph with size N . Although the resulting distribution has the same mean and variance as the GBS distribution, by comparing Fig. S2 and Fig. 6, we see that its performance is considerably worse: the maximum weighted clique is obtained only 1% of the time, compared to 12% for GBS.
In Fig. S3 we study the effect of noise and squeezing. The value r max = 0.9702 corresponds to an average number of detector clicks N 8. In the lossy case, for a fair comparison, we have increased the squeezing  to r max = 0.9780 in order to maintain the same average N 8 and have, accordingly, samples of the same average size. As Fig. S3 shows, the success rate is not diminished by the effect of noise, provided that the amount of squeezing is increased accordingly. As a matter of fact, the noisy version with larger squeezing displays a similar success rate after greedy shrinking (iteration 0). As the iterations increase, the success rate of both noisy and noiseless GBS maintain a significant margin compared to the purely classical strategy. The slightly better performance of the noisy case is due to the larger squeezing that changes the shape of the photon distribution, while keeping comparable photon averages with the noiseless case. This analysis shows that both GBS shrinking and its variant with local search are robust against noise, maintaining a significant margin compared to purely classical strategies.