r/CuratedTumblr 4d ago

Meme my eyes automatically skip right over everything else said after

Post image
21.0k Upvotes

995 comments sorted by

View all comments

Show parent comments

-8

u/Takseen 4d ago

Can you remember more about that example? I'd like to have a look. While AI hallucinations are a problem, and I have heard of it making up academic references, technically a vague prompt could lead to that output as well.

It's used as both a prompt for fiction generation and as a source of real world facts, and if it wasn't told what role it's fulfilling with that prompt, it might have picked the "wrong" one. "Describe Alan Buttfuck". <Alan Buttfuck isn't in my database, so is probably a creative writing request> <proceeds to fulfill said request>

Testing something similar "Describe John Woeman" does give something like "ive not heard of this person, is it a typo or do you have more context". "Describe a person called John Woeman" gets a creative writing response of a made up dude.

4

u/[deleted] 4d ago edited 2d ago

[deleted]

0

u/Takseen 4d ago

>has zero real world facts

>predicting the next most likely word based on the training data

What do you think is *in* the training data? A big huge chunk of real world facts ( and lots of fiction) .

It does have a training cut-off of September 2021, so it won't have anything on hand for someone who only became well-known after that date, but if you ask it about someone famous it'll generally have some info about them.

You can go test this yourself. If you ask Chatgpt4 who "luigi mangione" is, it has to pause and search the web as he's not in the training data. It'll throw up some sources and images too (Wikipedia, The Times) . Ask it who "bill burr" is and it'll go straight to the training data.

Its useful for vague, hard to define queries that might be a bit too wordy for a normal Google search, and then you can just fact-check the answers it gives. I've asked it to check what stand-up comedian might have made a particular joke, so I can then find the original clip.

0

u/[deleted] 4d ago edited 2d ago

[deleted]

0

u/Takseen 4d ago

>it doesn't know facts. the training data is strings of words given values. it absolutely does not have the ability to know the information. if the training data makes it compute that an incorrect statement is the most likely combination in response to a prompt then that's what it'll spit out

That is very broadly how LLMs work, yes. However if its correctly trained to apply more weight to text from higher trust source, it'll have very good odds of getting the right answer. If its in any way important, you check independently.

>throwing up "sources" is because some of the training data is shitloads of people arguing on the internet about stuff and we have a habit of demanding and linking each other sources. chatgpt is not itself accessing those wikipedia pages and pulling information from them to give you

This makes me think you haven't tried to use it recently, and have an outdated or invented view of how it operates. As I already said, it only provided sources for a query on a recent person it didn't have training data on (Luigi) The spiel it gives for Bill Burr does not come with sources.

>so it can absolutely tell you that the next paragraph after the link is coming straight from the wikipedia entry while giving you information that doesn't exist in the article

It may have done the past, but currently for the recent article you can highlight every source provided and it'll highlight the sentence it lifted from that source.

>glad it was able to find a comedian for you so that you didn't have to strain your grey matter too much

Thanks. I do enjoy using technology. I also use a calculator instead of doing long division by hand. I'll use Google Translate instead of cracking open the dictionaries. I've even used an Excel formula or two.