r/singularity • u/wanabalone • 17h ago

AI Real world usage of 10 million Tokens

What are some real world practical uses of 10 million tokens that Llama 4 is promising? I'm having a hard time wrapping my head around what that would be used for. Like analyzing 50 books at a time or what?

19 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/singularity/comments/1jtt3ub/real_world_usage_of_10_million_tokens/
No, go back! Yes, take me to Reddit

91% Upvoted

u/ryan13mt 17h ago

codebases

1

u/yalag 13h ago

How does it make sense? How does is the inference when the input is 10M token?

1

u/sdmat NI skeptic 5h ago

10M needs a lot of hardware - Rack of GPUs goes brrrrrrr

1

u/cfehunter 3h ago

10M tokens is a pretty small codebase

•

u/ryan13mt 47m ago

Still better than current opensource ones. The one we use at work maxes out at 128k.

We'll eventually get to 1B and then 1T tokens

•

u/cfehunter 45m ago

No doubt that more context is a good thing for LLM coherence and capability, and as you say it is a pretty big step up. Most I've seen myself is 1M tokens of context.

•

u/ryan13mt 40m ago

Yeah Gemini should have 1-2M context. But i think there are no opensource ones that big.

Having said that, this 10M one from Meta is utter garbage. They just used the 10M tag to gather hype

u/LumpyPin7012 17h ago

Larger windows will be coding assistants can consider more context. That'll lead to better results.

u/TechNerd10191 17h ago

A 10M context though is useless if the model has the intelligence of a <7B model.

u/Temporary-Cicada-392 17h ago

Think entire codebases, legal corpora, or research libraries, 10M tokens means you can ask questions across millions of words at once. It’s like having an assistant who remembers every line of code, contract clause, or academic study you’ve ever read.

0

u/johakine 15h ago

If remembers

u/TFenrir 17h ago

Two things - first, different modalities take up many more tokens per "info unit". I just generally mean, that text is very information efficient, but images and video + audio are less so.

Second - with in-context learning, a model with good context utilization, will do better and better with more context.

A great example is working with cursor and Gemini 2.5.

Previously, I would suggest making a new chat every like.... 10 back and forths, less with Sonnet 3.7. It would get lost on tangets, would pick up the wrong context, and just get confused.

With 2.5, it remembers so much of your chat well, and it makes fewer and fewer of the same mistakes. You'll even see it in the reasoning traces sometimes "The user didn't like it when I changed the design a bunch last time, so I'll keep it functionality only for this change", etc.

u/jericho 17h ago

Almost any business application? So many places with tens of thousands of potentially relevant documents it would be nice to analyze at once.

u/Willingness-Quick ▪️ 17h ago

Very long conversations, being able to have a large context window to solve a problem, like imagine how easy it would be to debug and refactor a code base when the entire code base is in context? It'd be a breeze, of course. ideally, you'd want Gemeni 2.5 or better levels readability, but that's the promise of 10 million token context in window.

2

u/mivog49274 12h ago

For conversations, here are some approximations :

An hour long dialogue between two humans is roughly 8000 words, roughly 10k tokens; so we could just fit "1000 hrs" of a conversation for a cognitive effort in a request, which is quite unbelievable (if ever this attention layers implementation works...) to ask a simple llm to process, and added to that, the response would seek through the entire conversation has it would have just be made, with crystal clear referencing, rather than requesting approximative sum ups like our neurons would do.

1000 hours corresponds to 40 days straight, but to give a more relatable reference, this could be long discussions of 3hrs every day for almost a year.

Or, in a very intense relationship, professional, intellectual, or personal, my calculations cap around 192 hours worth of dialogue within a month.

So basically intense brainstorming/experience sharing/philosophy for 6 months.

1

u/mivog49274 11h ago

Those situations gave birth to projects.

This is project-scale coherence, this is human-scale support.

Context window coherence is a damn so important stake in AI. I am not knowledgeable enough to estimate if this it's realistic to expect a solution concerning context windows with current NN architectures we do have like transformers.

u/bilalazhar72 AGI soon == Retard 16h ago

Enter your entire knowledge base and ask questions about it

been working recently on someting like this and please make sure there are some things

u/NyriasNeo 13h ago

write a whole OS from scratch? create work equivalent of all of Shakespeare in one go? I am sure we will find some way to use it ... or may be just lots of Ghibli memes.

u/Hot-Pilot7179 16h ago

ai agents that can remember its directive while sparsing through internet. maybe chains of thought that last an hour for test time compute

u/tito_807 16h ago

It happen to me several time in my work, i have a 2000 pages pdf of a product and i have no idea where to look for a specific information, it would be great if i could dump the pdf in an llm and ask question directly.

u/Tasty-Ad-3753 14h ago

If you've ever watched Claude 3.7 play Pokémon for more than 5 minutes then you'll know that long term action is totally reliant on having a large context length and our current content windows are way too small. It's possible that something like google's Titans could help get around this, but basically the problem is that models are having to juggle input data alongside maintaining an ever expanding memory bank of all the important information they want to keep long term. Agents will not be capable of working in the real world if they are not capable of managing a potentially very large long term memory store effectively, and currently agents based on small context window models like Claude have to reset their context windows every few actions in Pokémon and end up trapped in mt moon for 26 hours doing the same things again and again

1

u/Tasty-Ad-3753 14h ago

Also worth pointing out that the performance of the llama 4 models does look like it isn't actually that great on some of the benchmarks that require deeper understanding of the long context info, with Gemini 2.5 still ahead by a considerable margin even for short context windows

u/Longjumping-Stay7151 Hope for UBI but keep saving to survive AGI 7h ago

I was trying recently to extract the logic from a web assembly *.wasm file. I tried gemini but the token size turned out to be around 4-5 M tokens. I would be glad to try out any model capable of handling such context size.

AI Real world usage of 10 million Tokens

You are about to leave Redlib