If you thought the hype for artificial intelligence was off the charts, I think in a few years youâ€™ll be shocked at the hype around quantum computing. If youâ€™re unfamiliar with it, quantum computing is a novel computing paradigm where computers exploit novel quantum effects to get exponential speed ups in certain tasks. In particular, jobs like cracking our most secure passwords become trivial with a sufficiently powerful quantum computer. Since nothing worth doing is easy, you probably arenâ€™t surprised to learn that the task of building a quantum computer is a monumental one.

One of the big problems researchers have to tackle is circuit design. The un-intuitive nature of quantum mechanics means that it can be difficult to design circuits. It requires deep domain expertise, which lets face it, is in short supply. There arenâ€™t too many quantum mechanics roaming around. So finding a better solution to this problem would go at least part of the way towards making quantum computers viable.

A few years ago, researchers out of the university of Maryland, and NIST, demonstrated a novel use of deep reinforcement learning. They showed that an RL agent can be used to design quantum circuits, without any prior knowledge of physics. Thatâ€™s right. Zero domain knowledge is needed. The agent simply has to be able to create circuits, compare them to a desired output, and use a reward as a feedback mechanism to tease out the physics of designing quantum circuits.

Donâ€™t worry. You donâ€™t need to be a quantum physicist to appreciate this discovery; Iâ€™m going to break it down in simple terms.

### Quantum Superposition and Entanglement

One thing we take for granted in the world of classical physics is that systems are either in one state or another. If we have some bit of information represented a transistor, the transistor is either on or off. It canâ€™t be some combination of both at the same time. Surprisingly, this isnâ€™t the case in quantum systems.

Quantum systems can be prepared in what are called *superpositions* of states. This means the system can be both in the ground and excited state, for example. Keep in mind this is only pre-measurement. When the system is measured, a definitive state is picked and we get only a single answer. If we prepare a large ensemble of systems in an identical way, say in an equal superposition of ground and excited states, after measuring all the particles weâ€™ll get a 50% distribution for each state. This is called the expectation value of our measurement.

Another interesting fact is that the measurement process is completely random. We canâ€™t predict which bit of our system will end up in which state; we can only make statistical statements about what measurement weâ€™ll get.

As strange as a superposition of states may seem, quantum entanglement ratchets up the bizarre factor to 11.

As powerful as it is misunderstood, quantum entanglement really is at the heart of how quantum computers could be so powerful. The idea is that we can prepare a two particle system such that the states of both particles are correlated. For instance, if we have two particles, and they can each be in one of two states (i.e. a two particle two state system), we can prepare them in such a way that if one of the particles is in state 1, the other is in state 2. Or vice versa, or both could even have the same state.

On the surface, this may not seem all that useful, but in the context of encoding information onto bits (in the quantum case, they are called qubits), this effect gives rise to two consequences. The first of which is that we can encode an exponentially larger amount of information than in the classical case. The second is that we can now perform parallel processing, thus giving an exponential speed up in calculations.

The combination of exponentially increased information density and computational speed means that we can perform classically intractable calculations in a nearly trivial amount of time. The precise details are fascinating, but are frankly beyond the scope of what weâ€™re discussing here. Suffice it to say, quantum systems can be prepared in novel ways that allow us to encode and process on large volumes of information using simple two state systems of n particles.

### Bell and GHZ States

Perhaps the most simple and common entangled state is called a Bell state. There are several of those, but letâ€™s take a look at just one of them.

$$|\Phi^+\rangle = \frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)$$

In this state, both qubits are in an even superposition of both the ground and excited states. The one over square root 2 is a normalization factor, such that when we perform a measurement weâ€™re guaranteed to get one state or the other. This is an important property of quantum mechanical systems: they are normalized such that probability is conserved.

The importance of the Bell state is that we only have to measure 1 of the particles of the system to determine the state of both. We know that no matter how many times we measure, if we get state 0 for qubit 1, weâ€™re also going to get state 0 for qubit 2. The practical applications of this state canâ€™t be understated. Itâ€™s foundational to quantum key distribution and the Bell state is a fundamental building block in many quantum algorithms and error correction codes (more on that in a moment).

The N qubit generalization of the Bell state is called the GHZ (Greenberger-Horne-Zeilinger) state. For three qubits, we can write a GHZ state as

$$|GHZ\rangle = \frac{1}{\sqrt{2}}(|000\rangle + |111\rangle)$$

Once again, this system is in a superposition of states with all three qubits entangled. When we take a measurement, weâ€™re only going to get either |000> or |111> with 50 / 50 probability. But the key is we only have to measure a single qubit, instead of measuring all three. Once we know qubit 1 is in state |1>, the others are in state |1> as well.

### Quantum Noise and Error Correction

In practical terms, noise is present in every system. Error correction is critical even for classical computer systems. For instance, error correction in RAM is handled with ECC RAM, which youâ€™ll find in many high performance computing clusters. The problems with noise are only amplified at the quantum scale, due to the physical nature of the systems weâ€™re dealing with.

This is because quantum systems typically operate at very, very low temperatures. In general, modern quantum computers operate at the millikelvin temperature scale, meaning just a few tenths of a degree above absolute zero. Any interaction with the environment can leak ambient heat into the system and destroy the fragile quantum states.

Thus, one very active area of research in quantum computing is error correction and noise reduction. A full treatment of noise and strategies for coping with it is beyond the scope of a simple blog article, but suffice it to say if weâ€™re going to simulate quantum circuits, and train an RL agent to design them, we should probably incorporate noise into our simulations. Weâ€™ll return to this more in a minute when we get to the actual paper.

First, we need to talk about how we build quantum states.

### Quantum Gates and Operations

Quantum states are prepared, in a quantum computer, the application of certain operations on our qubits. These gates can change the states of our qubits, swap their states, and create superpositions.

Combinations of gates form quantum circuits, and these circuits are what represent quantum algorithms.

Once again, a full treatment is beyond the scope of what weâ€™re trying to do here, but the key point is that we can manipulate qubit states using gates, and we can use gates to perform computations using a quantum computer.

With enough background, weâ€™re ready to tackle the main ideas of the paper.

### Paper Analysis

If youâ€™re not familiar with reinforcement learning, the basic idea is that we have some system we want to optimize. Our system can be described some state, and the agent has some actions available to it. Taking an action causes the system to transition from one state to another, and issue a reward to our agent. All reinforcement learning agents seek to maximize total long term reward, generally accessing the most profitable states of the environment.

In the case of the present work, the state of the system is comprised of expectation values of the Pauli matrices. In plain language, this just means the tendencies of the qubits to be in states 0 or 1, or some superposition of them. There are three such matrices (X, Y, Z) and so our state is comprised of 3 expectation values for each qubit, or 3 x N values.

Our action space is comprised of the set of all possible gates the agent can apply to a specific qubit.

The reward is a bit more interesting. The agent is given a reward of -0.01 for each step, creating an incentive to make simple circuits. Once the final state is achieved, the circuit is compared to the target (i.e. the Bell state). How close it gets is called the fidelity(F), and the agent is given F â€“ 0.01 reward. Thus, the agent is incentivized to create the simplest and most accurate circuit it can.

Episode lengths are capped at 20 steps, so the worst possible reward is -0.2 and the best possible reward is 0.99

### Results

The results are pretty impressive. Using PPO, they were able to converge on the solution in both the 2 and 3 qubit noise free cases, as well as the noisy 2 qubit case. In all three cases, average rewards trend towards the 0.99 theoretical maximum, over the course of a reasonable number of episodes.

In the original paper they also investigated the use of A2C with similar results, but slower convergence. In real terms, however, these simulations run in under an hour, so the delta in performance is actually quite negligible.

### Discussion

Itâ€™s important to question how useful this result actually is. After all, itâ€™s only 2 and 3 qubit systems and real quantum computers (when theyâ€™re eventually built) will require thousands of them. The short answer is that yes, this result is still useful. You can construct quantum circuits of arbitrary size combining 2 and 3 qubit sub-circuits. The trade off, as always, is computational complexity.

The inclusion of noise, and the demonstration that RL can handle noisy systems goes a long way towards making this a viable strategy.

Another potential avenue for future research is circuit optimization. Here, the authors started from a blank slate and constructed circuits from scratch. Itâ€™s entirely possible to take existing circuits and optimize them further using reinforcement learning. In fact, this is the topic of a paper weâ€™ll take a look at later, where researchers from IBM used PPO to optimize quantum circuits for various architectures.

### Conclusion

This is just the beginning of using RL to advanced quantum computation. This paper showed that, even with modest resources and an off the shelf algorithm, non-trivial quantum circuits can be constructed and executed with high fidelity. What the future holds is uncertain, but I canâ€™t wait.