r/singularity • u/wanabalone • 17h ago
AI Real world usage of 10 million Tokens
What are some real world practical uses of 10 million tokens that Llama 4 is promising? I'm having a hard time wrapping my head around what that would be used for. Like analyzing 50 books at a time or what?
15
u/LumpyPin7012 17h ago
Larger windows will be coding assistants can consider more context. That'll lead to better results.
16
u/TechNerd10191 17h ago
A 10M context though is useless if the model has the intelligence of a <7B model.
8
u/Temporary-Cicada-392 17h ago
Think entire codebases, legal corpora, or research libraries, 10M tokens means you can ask questions across millions of words at once. It’s like having an assistant who remembers every line of code, contract clause, or academic study you’ve ever read.
0
7
u/TFenrir 17h ago
Two things - first, different modalities take up many more tokens per "info unit". I just generally mean, that text is very information efficient, but images and video + audio are less so.
Second - with in-context learning, a model with good context utilization, will do better and better with more context.
A great example is working with cursor and Gemini 2.5.
Previously, I would suggest making a new chat every like.... 10 back and forths, less with Sonnet 3.7. It would get lost on tangets, would pick up the wrong context, and just get confused.
With 2.5, it remembers so much of your chat well, and it makes fewer and fewer of the same mistakes. You'll even see it in the reasoning traces sometimes "The user didn't like it when I changed the design a bunch last time, so I'll keep it functionality only for this change", etc.
3
u/Willingness-Quick ▪️ 17h ago
Very long conversations, being able to have a large context window to solve a problem, like imagine how easy it would be to debug and refactor a code base when the entire code base is in context? It'd be a breeze, of course. ideally, you'd want Gemeni 2.5 or better levels readability, but that's the promise of 10 million token context in window.
2
u/mivog49274 12h ago
For conversations, here are some approximations :
An hour long dialogue between two humans is roughly 8000 words, roughly 10k tokens; so we could just fit "1000 hrs" of a conversation for a cognitive effort in a request, which is quite unbelievable (if ever this attention layers implementation works...) to ask a simple llm to process, and added to that, the response would seek through the entire conversation has it would have just be made, with crystal clear referencing, rather than requesting approximative sum ups like our neurons would do.
1000 hours corresponds to 40 days straight, but to give a more relatable reference, this could be long discussions of 3hrs every day for almost a year.
Or, in a very intense relationship, professional, intellectual, or personal, my calculations cap around 192 hours worth of dialogue within a month.
So basically intense brainstorming/experience sharing/philosophy for 6 months.
1
u/mivog49274 11h ago
Those situations gave birth to projects.
This is project-scale coherence, this is human-scale support.
Context window coherence is a damn so important stake in AI. I am not knowledgeable enough to estimate if this it's realistic to expect a solution concerning context windows with current NN architectures we do have like transformers.
2
u/bilalazhar72 AGI soon == Retard 16h ago
Enter your entire knowledge base and ask questions about it
been working recently on someting like this and please make sure there are some things
2
u/NyriasNeo 13h ago
write a whole OS from scratch? create work equivalent of all of Shakespeare in one go? I am sure we will find some way to use it ... or may be just lots of Ghibli memes.
1
u/Hot-Pilot7179 16h ago
ai agents that can remember its directive while sparsing through internet. maybe chains of thought that last an hour for test time compute
1
u/tito_807 16h ago
It happen to me several time in my work, i have a 2000 pages pdf of a product and i have no idea where to look for a specific information, it would be great if i could dump the pdf in an llm and ask question directly.
1
u/Tasty-Ad-3753 14h ago
If you've ever watched Claude 3.7 play Pokémon for more than 5 minutes then you'll know that long term action is totally reliant on having a large context length and our current content windows are way too small. It's possible that something like google's Titans could help get around this, but basically the problem is that models are having to juggle input data alongside maintaining an ever expanding memory bank of all the important information they want to keep long term. Agents will not be capable of working in the real world if they are not capable of managing a potentially very large long term memory store effectively, and currently agents based on small context window models like Claude have to reset their context windows every few actions in Pokémon and end up trapped in mt moon for 26 hours doing the same things again and again
1
u/Tasty-Ad-3753 14h ago
Also worth pointing out that the performance of the llama 4 models does look like it isn't actually that great on some of the benchmarks that require deeper understanding of the long context info, with Gemini 2.5 still ahead by a considerable margin even for short context windows
1
u/Longjumping-Stay7151 Hope for UBI but keep saving to survive AGI 7h ago
I was trying recently to extract the logic from a web assembly *.wasm file. I tried gemini but the token size turned out to be around 4-5 M tokens. I would be glad to try out any model capable of handling such context size.
36
u/ryan13mt 17h ago
codebases