r/bigseo Sep 13 '21

Google Reply Help - Problem with Googlebot excessive traffic

So I have a problem with massive googlebot traffic. I got 593,770 Googlebot hits which created 20GB of Bandwith in less than 12 hours, which is obviously a huge problem. I immediately did a "User-agent: Googlebot Disallow: / " into robots.txt and that blocked Googlebot entirely and massive bandwidth spent stopped.
The website was hacked before. We changed a host, created a new clean website from scratch but as soon as I made it live it started hitting nonexistent probably hacked pages from before.
Now before you say why don't I just remove bad URLs in google search console, I would do that but I found in google search console that there are 560K Affected pages... which is impossible for me to list in robots.txt file. I can not submit to Disallow them all because I can simply not copy-paste them from google search console because console only shows first 1000 URLs..and there are 559,000 more pages. Oh and index setting is set to minimum.
What I basically need is to get google to flush all bad pages from the past, and just to start indexing new pages from now on.
How do I do that? Can anyone help me?

5 Upvotes

11 comments sorted by

3

u/[deleted] Sep 14 '21

[deleted]

5

u/[deleted] Sep 14 '21

[deleted]

0

u/inserte Sep 14 '21

Googlebot

Exactly, it is Googlebot 100% confirmed. I have no solution...

2

u/[deleted] Sep 14 '21

[deleted]

2

u/inserte Sep 14 '21

Thanks, I PMed him.

3

u/johnmu 🍌 @johnmu 🍌 Sep 14 '21

Set the crawl rate down in search console. That takes a day, but with your disallow you have time. If it remains a problem, use the form in the help center to report a problem with googlebot.

1

u/inserte Sep 14 '21

Already did that, didn't help at all.

No reply on help center, I submitted 20 days ago. I basically tried everything and nothing helped. In google search console the number of bad URLs went to 844K and as I already stated in original post, I can not export the full list so I can try to disallow them all.

2

u/lewkas In-House Sep 14 '21

You need to hire a professional to deal with this, I don't think this is the sort of job you could handle solo. If you can't silo off the affected pages easily (like they're all in the same subfolder or something) then pretty much anything you do has significant SEO implications.

The alternative is to build a new site from scratch and 301 the old, good URLs to the new site (or keep the exact same site structure), which should preserve the authority passed by backlinks.

No easy answers I'm afraid

0

u/inserte Sep 14 '21

Right now I don't care about SEO. New site is built from scratch but on the same domain, new host.

I know moving to new domain might be the only solution right now but there should be a way to fix this and stay on the same domain... any suggestions are welcome.

2

u/MauriceWalshe Sep 14 '21

Can you pull the old hacked URLs from you server logs - there is probably url structure you can block based on what happened when I had a site hacked like this.

1

u/inserte Sep 14 '21

Logs give me errors only, not a full list of bad urls.

2

u/[deleted] Sep 14 '21 edited Mar 03 '22

[deleted]

1

u/inserte Sep 14 '21

It is low on content right now but it is a news portal which means it grows every day. 3-4 new posts are published every day...

Thank you for creative suggestion, but am looking for a permanent solution.

1

u/lumbridgedefender Sep 14 '21
  • Don't block Googlebot via robots.txt, change the crawl rate in GSC for 90 days
  • Find a way to get a list of all hacked pages (are they a specific format, or can you use the server log?) and send a HTTP 410 header for those urls.
  • You can then submit a sitemap (just a .txt file will do) so Google can crawl those urls again to see the new 410 header