https://doi.org/10.1051/epjconf/202429512003
Measurements With A Quantum Vision Transformer: A Naive Approach
1 Physics Department, University of California Santa Cruz, Santa Cruz, CA 95064
2 CERN, 1211 Geneva 23, Switzerland
1 The terms query, key, and value come from the days of retrieval systems when a search engine would map the Query (e.g. the text in a search bar) against the Keys (e.g. given descriptors like video title, description, etc...) of indexed items, and then the search engine would return the best matched items (Values) to the user.
Published online: 6 May 2024
In mainstream machine learning, transformers are gaining widespread usage. As Vision Transformers rise in popularity in computer vision, they now aim to tackle a wide variety of machine learning applications. In particular, transformers for High Energy Physics (HEP) experiments continue to be investigated for tasks including jet tagging, particle reconstruction, and pile-up mitigation.
An improved Quantum Vision Transformer (QViT) with a quantum-enhanced self-attention mechanism is introduced and discussed. A shallow circuit is proposed for each component of self-attention to leverage current Noisy Intermediate Scale Quantum (NISQ) devices. Variations of the hybrid architecture/model are explored and analyzed.
The results demonstrate a successful proof of concept for the QViT, and establish a competitive performance benchmark for the proposed design and implementation. The findings also provide strong motivation to experiment with different architectures, hyperparameters, and datasets, setting the stage for implementation in HEP environments where transformers are increasingly used in state of the art machine learning solutions.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.