attention and long context
updated
Efficient Streaming Language Models with Attention Sinks
Paper
• 2309.17453
• Published
• 14
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published
• 31
allenai/longformer-base-4096
Updated
• 984k
• 223
google/bigbird-roberta-base
Updated
• 177k
• 61
Fill-Mask
• Updated
• 1.33k
Yukang/Llama-2-7b-longlora-100k-ft
Text Generation
• Updated
• 871
• 52
Updated
• 32.1k
• 50
RRWKV: Capturing Long-range Dependencies in RWKV
Paper
• 2306.05176
• Published
Retentive Network: A Successor to Transformer for Large Language Models
Paper
• 2307.08621
• Published
• 173
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
• 2302.10866
• Published
• 7
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution
Paper
• 2306.15794
• Published
• 18
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Paper
• 2212.14052
• Published
• 1
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
• 2310.01889
• Published
• 13
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published
• 90
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
• 2303.09752
• Published
• 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
• 2112.07916
• Published
• 2
Investigating Efficiently Extending Transformers for Long Input
Summarization
Paper
• 2208.04347
• Published
Train Short, Test Long: Attention with Linear Biases Enables Input
Length Extrapolation
Paper
• 2108.12409
• Published
• 5
Text Generation
• Updated
• 13.5k
• 435
NousResearch/Yarn-Mistral-7b-128k
Text Generation
• Updated
• 1.77k
• 571
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 81
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116