Draft:Hyena (technology)

Submission declined on 2 August 2024 by Timtrent (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Timtrent 4 days ago. Last edited by Timtrent 4 days ago. Reviewer: Inform author.

Resubmit

Please note that if the issues are not fixed, the draft will be declined again.

Comment: Please replace all Arxiv.org references. That source is unreliable. We can review it once suitable sources have been chosen 🇺🇦 Fiddle^Timtrent Faddle^Talk to me 🇺🇦 20:52, 2 August 2024 (UTC)

Hyena is an architecture for creating large language models (LLMs), a realm of generative artificial intelligence, developed by researchers at Stanford University and released in March 2023.^[1] It was designed as a computationally more efficient alternative to the transformer architecture that underlies popular LLMs such as ChatGPT from OpenAI, Gemini from Google, Claude from Anthropic, and Llama from Meta. Hyena replaces the attention mechanism of transformer architecture with a combination of long convolutions and data-controlled gating mechanisms. This allows Hyena to achieve sub-quadratic time complexity, thereby reducing computational costs compared to transformer-based models, while demonstrating similar performance.^[2]^[3]^[4]

Large Language Models

Large language models work by first being trained with massive amounts of text gathered from a variety of sources.^[5] This "training data" is ingested into the model and processed as "tokens"—i.e., pieces of text, each as short as a single character or as long as a word or part of a word—during the training phase.^[6] Once trained, the model can be given a prompt or query ("input") in the inference phase and then generate a response ("output"), by predicting the next token in a sequence.^[5] Generally, LLMs trained with larger data sets perform better than those trained with smaller data sets, and LLMs able to process larger amounts of input data (a larger "context window") can handle more complex inference tasks.^[7] However, computational demands generally increase with the size of the input data and context window the model can handle.^[4]

Quadratic Time Complexity of Transformer Architecture

Since its introduction by Google in 2017, transformer architecture has become the predominant paradigm for LLMs such as ChatGPT.^[8] A foundational component of the transformer architecture is its attention mechanism, which operates by evaluating the dependencies between every pair of elements (tokens) in an input sequence.^[9] Thus, as the sequence increases in size, the computation—and associated cost—required to process these pairwise operations increases in proportion to the square of the number of elements in the sequence.^[10]

In computer science, this is known as "quadratic time complexity," which limits the scalability of the models due to the high computational costs.^[11] Researchers therefore have attempted to develop alternative approaches, such as Hyena, that significantly reduce computational costs by achieving "sub-quadratic time complexity."^[12]^[13]^[14]

Long Convolutions and Gating

To achieve sub-quadratic time complexity, Hyena replaces the attention mechanism of transformer architectures "by interleaving implicitly parametrized long convolutions and data-controlled gating."^[2] The long convolutions in Hyena use a feed-forward network to dynamically define the convolutional filters. This allows the model to handle very long sequences more efficiently than attention mechanisms.^[2]

Data-controlled gating in the Hyena architecture is designed to control the influence of different parts of the input data on the model's output, dynamically adjusting based on the input data. This is achieved through multiplicative element-wise gating, where the input data is modulated by a gate that determines how much of the input should be passed through. The gate values are computed as a function of the input, allowing the model to adaptively focus on different parts of the input sequence, further adding to Hyena's efficiency.^[2]

Applications

The Hyena architecture has been employed in various applications, including:

HyenaDNA: HyenaDNA is an advanced genomic model designed to handle extremely long DNA sequences, up to 1 million tokens. It is designed to make genomic predictions, such as identifying regulatory elements and chromatin profiles, and supports in-context learning.^[15]
StripedHyena: This model is designed for both natural language and biological sequence processing. It combines "rotary attention"—a technique to more efficiently handle longer text sequences—and gated convolutions arranged in Hyena blocks, offering improved scaling and efficiency over traditional transformer models.^[16]
Evo-1-7B: This is a biological foundation model designed for long-context modeling and generative design across various biological modalities, including DNA, RNA, and proteins. Trained on a large prokaryotic whole-genome dataset, it can handle sequences at a single-nucleotide, byte-level resolution and is able to generate entire genomes. Evo-1-7B uses the StripedHyena architecture to achieve efficient computation and memory scaling.^[17]

Performance

Hyena matched attention-based models in recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens. Hyena "set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile)."^[2] It reached "transformer quality with a 20% reduction in training compute required at sequence length 2K."^[2] At sequence lengths of 8K, Hyena operators are "twice as fast as highly optimized attention, and 100x faster at sequence length 64K."^[2]

References

^ "Hyena Hierarchy: Towards Larger Convolutional Language Models". hazyresearch.stanford.edu. Retrieved 2024-07-31.
^ ^a ^b ^c ^d ^e ^f ^g Poli, Michael; Massaroli, Stefano; Nguyen, Eric; Fu, Daniel Y.; Dao, Tri; Baccus, Stephen; Bengio, Yoshua; Ermon, Stefano; Ré, Christopher (2023-04-19), Hyena Hierarchy: Towards Larger Convolutional Language Models, arXiv:2302.10866, retrieved 2024-07-31
^ "This new technology could blow away GPT-4 and everything like it". ZDNET. Retrieved 2024-07-31.
^ ^a ^b Lee, Timothy B. "Large language models, explained with a minimum of math and jargon". www.understandingai.org. Retrieved 2024-07-31.
^ ^a ^b Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes (2022-03-29), Training Compute-Optimal Large Language Models, arXiv:2203.15556, retrieved 2024-08-01
^ "Home". Artificial Intelligence School. Retrieved 2024-08-01.
^ "Understanding LLMs: A Comprehensive Overview from Training to Inference". arxiv.org. Retrieved 2024-08-02.
^ Sajun, Ali Reza; Zualkernan, Imran; Sankalpa, Donthi (January 2024). "A Historical Survey of Advances in Transformer Architectures". Applied Sciences. 14 (10): 4316. doi:10.3390/app14104316. ISSN 2076-3417.
^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2023-08-01), Attention Is All You Need, arXiv:1706.03762, retrieved 2024-07-31
^ Toews, Rob. "Transformers Revolutionized AI. What Will Replace Them?". Forbes. Retrieved 2024-07-31.
^ Kacham, Praneeth; Mirrokni, Vahab; Zhong, Peilin (2024-03-17), PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels, arXiv:2310.01655, retrieved 2024-08-01
^ "The Efficiency Spectrum of Large Language Models: An Algorithmic Survey". arxiv.org. Retrieved 2024-08-02.
^ "The Safari of Deep Signal Processing: Hyena and Beyond". hazyresearch.stanford.edu. Retrieved 2024-08-01.
^ "Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models". arxiv.org. Retrieved 2024-08-02.
^ Nguyen, Eric; Poli, Michael; Faizi, Marjan; Thomas, Armin; Birch-Sykes, Callum; Wornow, Michael; Patel, Aman; Rabideau, Clayton; Massaroli, Stefano (2023-11-14), HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution, arXiv:2306.15794, PMID 37426456, retrieved 2024-07-31
^ Schreiner, Maximilian (2023-12-10). "StripedHyena: A new architecture for next-generation generative AI?". THE DECODER. Retrieved 2024-07-31.
^ Nguyen, Eric; Poli, Michael; Durrant, Matthew G.; Thomas, Armin W.; Kang, Brian; Sullivan, Jeremy; Ng, Madelena Y.; Lewis, Ashley; Patel, Aman (2024-02-27), Sequence modeling and design from molecular to genome scale with Evo, doi:10.1101/2024.02.27.582234, retrieved 2024-07-31

[:0-1] "Hyena Hierarchy: Towards Larger Convolutional Language Models". hazyresearch.stanford.edu. Retrieved 2024-07-31.

[:1-2] ^ ^a ^b ^c ^d ^e ^f ^g Poli, Michael; Massaroli, Stefano; Nguyen, Eric; Fu, Daniel Y.; Dao, Tri; Baccus, Stephen; Bengio, Yoshua; Ermon, Stefano; Ré, Christopher (2023-04-19), Hyena Hierarchy: Towards Larger Convolutional Language Models, arXiv:2302.10866, retrieved 2024-07-31

[3] "This new technology could blow away GPT-4 and everything like it". ZDNET. Retrieved 2024-07-31.

[:3-4] Lee, Timothy B. "Large language models, explained with a minimum of math and jargon". www.understandingai.org. Retrieved 2024-07-31.

[:4-5] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes (2022-03-29), Training Compute-Optimal Large Language Models, arXiv:2203.15556, retrieved 2024-08-01

[6] "Home". Artificial Intelligence School. Retrieved 2024-08-01.

[7] "Understanding LLMs: A Comprehensive Overview from Training to Inference". arxiv.org. Retrieved 2024-08-02.

[8] Sajun, Ali Reza; Zualkernan, Imran; Sankalpa, Donthi (January 2024). "A Historical Survey of Advances in Transformer Architectures". Applied Sciences. 14 (10): 4316. doi:10.3390/app14104316. ISSN 2076-3417.

[9] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2023-08-01), Attention Is All You Need, arXiv:1706.03762, retrieved 2024-07-31

[:2-10] Toews, Rob. "Transformers Revolutionized AI. What Will Replace Them?". Forbes. Retrieved 2024-07-31.

[11] Kacham, Praneeth; Mirrokni, Vahab; Zhong, Peilin (2024-03-17), PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels, arXiv:2310.01655, retrieved 2024-08-01

[12] "The Efficiency Spectrum of Large Language Models: An Algorithmic Survey". arxiv.org. Retrieved 2024-08-02.

[13] "The Safari of Deep Signal Processing: Hyena and Beyond". hazyresearch.stanford.edu. Retrieved 2024-08-01.

[14] "Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models". arxiv.org. Retrieved 2024-08-02.

[15] Nguyen, Eric; Poli, Michael; Faizi, Marjan; Thomas, Armin; Birch-Sykes, Callum; Wornow, Michael; Patel, Aman; Rabideau, Clayton; Massaroli, Stefano (2023-11-14), HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution, arXiv:2306.15794, PMID 37426456, retrieved 2024-07-31

[16] Schreiner, Maximilian (2023-12-10). "StripedHyena: A new architecture for next-generation generative AI?". THE DECODER. Retrieved 2024-07-31.

[17] Nguyen, Eric; Poli, Michael; Durrant, Matthew G.; Thomas, Armin W.; Kang, Brian; Sullivan, Jeremy; Ng, Madelena Y.; Lewis, Ashley; Patel, Aman (2024-02-27), Sequence modeling and design from molecular to genome scale with Evo, doi:10.1101/2024.02.27.582234, retrieved 2024-07-31

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]