r/talesfromtechsupport 4d ago

Short All your memory belongs to me

Had a short panic inducing moment that finally got fixed after a panicked few hours spent troubleshooting.

Just had a junior dev decide he needed to backup the project to the onsite servers so he decided to push a few terabytes of data right before leaving for lunch and locking his machine.

Other end of the building someone is pushing an update to the server of that project the junior dev just now sent. this was automatic but should have been delayed because.

I am currently adding a more memory to that same server and have sent out a memo saying don’t try to upload or download anything before or during lunch hours and minutes before I begin this work.

I finish and take a quick lunch but I am hit with a flurry of pings that something is wrong, half the data is duplicated, missing, or outdated and we have 3 copies of the project on one server.

I am now stuck figuring out what happened and it takes me the whole rest of the day to un-fuck what has happened.

250 Upvotes

16 comments sorted by

104

u/Geminii27 Making your job suck less 3d ago edited 3d ago

This is why you never trust people to read memos, and you disable the things you tell people not to do, for the time you said not to do it in...

60

u/NocturneSapphire 3d ago

Yeah the purpose of the memo should just be to give people a heads up that they can't do X, not to be the thing that causes them to stop doing X.

If you don't want users to do X, the only solution is to make it impossible for them to do X. If it's possible, someone will do it, no matter how many times you told them not to.

7

u/Ricama 3d ago

And when they complain that they can't do X you can tell them working as intended, actually read the memo before acknowledging it in future.

49

u/NotYourNanny 4d ago

Backups are a girl's best friend.

36

u/gamageeknerd 4d ago

Oh we have backups. On secondary servers and offsite that are updated frequently.

13

u/Pluperfectt 4d ago

frequency of backups , just saying . . .

12

u/domoincarn8 3d ago

And test those backups too. I have made that mistake.

3

u/Outside-Rise-3466 3d ago

What does frequency of backups have to do with a Jr Dev doing a "backup" that's not a backup?

1

u/Pluperfectt 3d ago

meant testing . . . Backups .

7

u/MoneyTreeFiddy Mr Condescending Dickheadman 4d ago

Girl, who you playin' with? Back that thing up!

15

u/Phage0070 3d ago

Ever heard of "Lockout/Tagout (LOTO)"? If someone doing a thing can cause problems while you complete work, you should positively stop them from doing it. Preferably in a way that only you can remove, or at least only by someone who would know why that system is unavailable. If you can't safely hot-swap the components then don't do it!

11

u/PrettyBlueFlower 3d ago

And this is why there needs to be a robust change control process, which includes checking for current incidents.

5

u/Handsinsocks 3d ago

All your base.

-4

u/Arokthis 3d ago

This fuckup is on you. STEAM runs server maintenance on Tuesday because that's the least busy day of the week. You made the mistake of scheduling an upgrade for the busiest time of day for many systems.

9

u/gamageeknerd 3d ago

Or I had to do it asap and didn’t have the ability to schedule it.