attention and long context - a nbroad Collection

nbroad 's Collections

Document Models (Pretrained)

Document Models (Fine-tuned)

attention and long context

Detect AI Generated Text

attention and long context

updated Feb 22, 2024

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 14
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 31
allenai/longformer-base-4096

Updated Apr 5, 2023 • 984k • 223
google/bigbird-roberta-base

Updated Jun 2, 2021 • 177k • 61
uw-madison/yoso-4096

Fill-Mask • Updated Jan 12, 2022 • 1.33k
Yukang/Llama-2-7b-longlora-100k-ft

Text Generation • Updated Sep 25, 2023 • 871 • 52
allenai/led-base-16384

Updated Jan 24, 2023 • 32.1k • 50
RRWKV: Capturing Long-range Dependencies in RWKV

Paper • 2306.05176 • Published Jun 8, 2023
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 173
Hyena Hierarchy: Towards Larger Convolutional Language Models

Paper • 2302.10866 • Published Feb 21, 2023 • 7
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Paper • 2306.15794 • Published Jun 27, 2023 • 18
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Paper • 2212.14052 • Published Dec 28, 2022 • 1
Ring Attention with Blockwise Transformers for Near-Infinite Context

Paper • 2310.01889 • Published Oct 3, 2023 • 13
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 90
CoLT5: Faster Long-Range Transformers with Conditional Computation

Paper • 2303.09752 • Published Mar 17, 2023 • 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences

Paper • 2112.07916 • Published Dec 15, 2021 • 2
Investigating Efficiently Extending Transformers for Long Input Summarization

Paper • 2208.04347 • Published Aug 8, 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Paper • 2108.12409 • Published Aug 27, 2021 • 5
amazon/MistralLite

Text Generation • Updated May 16, 2024 • 13.5k • 435
NousResearch/Yarn-Mistral-7b-128k

Text Generation • Updated Nov 2, 2023 • 1.77k • 571
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 81
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116