Draft:Hyena (technology)

  • Comment: Please replace all Arxiv.org references. That source is unreliable. We can review it once suitable sources have been chosen 🇺🇦 FiddleTimtrent FaddleTalk to me 🇺🇦 20:52, 2 August 2024 (UTC)

Hyena is an architecture for creating large language models (LLMs), a realm of generative artificial intelligence, developed by researchers at Stanford University and released in March 2023.[1] It was designed as a computationally more efficient alternative to the transformer architecture that underlies popular LLMs such as ChatGPT from OpenAI, Gemini from Google, Claude from Anthropic, and Llama from Meta. Hyena replaces the attention mechanism of transformer architecture with a combination of long convolutions and data-controlled gating mechanisms. This allows Hyena to achieve sub-quadratic time complexity, thereby reducing computational costs compared to transformer-based models, while demonstrating similar performance.[2][3][4]

Large Language Models

edit

Large language models work by first being trained with massive amounts of text gathered from a variety of sources.[5] This "training data" is ingested into the model and processed as "tokens"—i.e., pieces of text, each as short as a single character or as long as a word or part of a word—during the training phase.[6] Once trained, the model can be given a prompt or query ("input") in the inference phase and then generate a response ("output"), by predicting the next token in a sequence.[5] Generally, LLMs trained with larger data sets perform better than those trained with smaller data sets, and LLMs able to process larger amounts of input data (a larger "context window") can handle more complex inference tasks.[7] However, computational demands generally increase with the size of the input data and context window the model can handle.[4]

Quadratic Time Complexity of Transformer Architecture

edit

Since its introduction by Google in 2017, transformer architecture has become the predominant paradigm for LLMs such as ChatGPT.[8] A foundational component of the transformer architecture is its attention mechanism, which operates by evaluating the dependencies between every pair of elements (tokens) in an input sequence.[9] Thus, as the sequence increases in size, the computation—and associated cost—required to process these pairwise operations increases in proportion to the square of the number of elements in the sequence.[10]

In computer science, this is known as "quadratic time complexity," which limits the scalability of the models due to the high computational costs.[11] Researchers therefore have attempted to develop alternative approaches, such as Hyena, that significantly reduce computational costs by achieving "sub-quadratic time complexity."[12][13][14]

Long Convolutions and Gating

edit

To achieve sub-quadratic time complexity, Hyena replaces the attention mechanism of transformer architectures "by interleaving implicitly parametrized long convolutions and data-controlled gating."[2] The long convolutions in Hyena use a feed-forward network to dynamically define the convolutional filters. This allows the model to handle very long sequences more efficiently than attention mechanisms.[2]

Data-controlled gating in the Hyena architecture is designed to control the influence of different parts of the input data on the model's output, dynamically adjusting based on the input data. This is achieved through multiplicative element-wise gating, where the input data is modulated by a gate that determines how much of the input should be passed through. The gate values are computed as a function of the input, allowing the model to adaptively focus on different parts of the input sequence, further adding to Hyena's efficiency.[2]

Applications

edit

The Hyena architecture has been employed in various applications, including:

  • HyenaDNA: HyenaDNA is an advanced genomic model designed to handle extremely long DNA sequences, up to 1 million tokens. It is designed to make genomic predictions, such as identifying regulatory elements and chromatin profiles, and supports in-context learning.[15]
  • StripedHyena: This model is designed for both natural language and biological sequence processing. It combines "rotary attention"—a technique to more efficiently handle longer text sequences—and gated convolutions arranged in Hyena blocks, offering improved scaling and efficiency over traditional transformer models.[16]
  • Evo-1-7B: This is a biological foundation model designed for long-context modeling and generative design across various biological modalities, including DNA, RNA, and proteins. Trained on a large prokaryotic whole-genome dataset, it can handle sequences at a single-nucleotide, byte-level resolution and is able to generate entire genomes. Evo-1-7B uses the StripedHyena architecture to achieve efficient computation and memory scaling.[17]

Performance

edit

Hyena matched attention-based models in recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens. Hyena "set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile)."[2] It reached "transformer quality with a 20% reduction in training compute required at sequence length 2K."[2] At sequence lengths of 8K, Hyena operators are "twice as fast as highly optimized attention, and 100x faster at sequence length 64K."[2]

References

edit
  1. ^ "Hyena Hierarchy: Towards Larger Convolutional Language Models". hazyresearch.stanford.edu. Retrieved 2024-07-31.
  2. ^ a b c d e f g Poli, Michael; Massaroli, Stefano; Nguyen, Eric; Fu, Daniel Y.; Dao, Tri; Baccus, Stephen; Bengio, Yoshua; Ermon, Stefano; Ré, Christopher (2023-04-19), Hyena Hierarchy: Towards Larger Convolutional Language Models, arXiv:2302.10866, retrieved 2024-07-31
  3. ^ "This new technology could blow away GPT-4 and everything like it". ZDNET. Retrieved 2024-07-31.
  4. ^ a b Lee, Timothy B. "Large language models, explained with a minimum of math and jargon". www.understandingai.org. Retrieved 2024-07-31.
  5. ^ a b Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes (2022-03-29), Training Compute-Optimal Large Language Models, arXiv:2203.15556, retrieved 2024-08-01
  6. ^ "Home". Artificial Intelligence School. Retrieved 2024-08-01.
  7. ^ "Understanding LLMs: A Comprehensive Overview from Training to Inference". arxiv.org. Retrieved 2024-08-02.
  8. ^ Sajun, Ali Reza; Zualkernan, Imran; Sankalpa, Donthi (January 2024). "A Historical Survey of Advances in Transformer Architectures". Applied Sciences. 14 (10): 4316. doi:10.3390/app14104316. ISSN 2076-3417.
  9. ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2023-08-01), Attention Is All You Need, arXiv:1706.03762, retrieved 2024-07-31
  10. ^ Toews, Rob. "Transformers Revolutionized AI. What Will Replace Them?". Forbes. Retrieved 2024-07-31.
  11. ^ Kacham, Praneeth; Mirrokni, Vahab; Zhong, Peilin (2024-03-17), PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels, arXiv:2310.01655, retrieved 2024-08-01
  12. ^ "The Efficiency Spectrum of Large Language Models: An Algorithmic Survey". arxiv.org. Retrieved 2024-08-02.
  13. ^ "The Safari of Deep Signal Processing: Hyena and Beyond". hazyresearch.stanford.edu. Retrieved 2024-08-01.
  14. ^ "Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models". arxiv.org. Retrieved 2024-08-02.
  15. ^ Nguyen, Eric; Poli, Michael; Faizi, Marjan; Thomas, Armin; Birch-Sykes, Callum; Wornow, Michael; Patel, Aman; Rabideau, Clayton; Massaroli, Stefano (2023-11-14), HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution, arXiv:2306.15794, PMID 37426456, retrieved 2024-07-31
  16. ^ Schreiner, Maximilian (2023-12-10). "StripedHyena: A new architecture for next-generation generative AI?". THE DECODER. Retrieved 2024-07-31.
  17. ^ Nguyen, Eric; Poli, Michael; Durrant, Matthew G.; Thomas, Armin W.; Kang, Brian; Sullivan, Jeremy; Ng, Madelena Y.; Lewis, Ashley; Patel, Aman (2024-02-27), Sequence modeling and design from molecular to genome scale with Evo, doi:10.1101/2024.02.27.582234, retrieved 2024-07-31