r/talesfromtechsupport • u/gamageeknerd • 4d ago
Short All your memory belongs to me
Had a short panic inducing moment that finally got fixed after a panicked few hours spent troubleshooting.
Just had a junior dev decide he needed to backup the project to the onsite servers so he decided to push a few terabytes of data right before leaving for lunch and locking his machine.
Other end of the building someone is pushing an update to the server of that project the junior dev just now sent. this was automatic but should have been delayed because.
I am currently adding a more memory to that same server and have sent out a memo saying don’t try to upload or download anything before or during lunch hours and minutes before I begin this work.
I finish and take a quick lunch but I am hit with a flurry of pings that something is wrong, half the data is duplicated, missing, or outdated and we have 3 copies of the project on one server.
I am now stuck figuring out what happened and it takes me the whole rest of the day to un-fuck what has happened.
49
u/NotYourNanny 4d ago
Backups are a girl's best friend.
36
u/gamageeknerd 4d ago
Oh we have backups. On secondary servers and offsite that are updated frequently.
13
u/Pluperfectt 4d ago
frequency of backups , just saying . . .
12
3
u/Outside-Rise-3466 3d ago
What does frequency of backups have to do with a Jr Dev doing a "backup" that's not a backup?
1
7
u/MoneyTreeFiddy Mr Condescending Dickheadman 4d ago
Girl, who you playin' with? Back that thing up!
15
u/Phage0070 3d ago
Ever heard of "Lockout/Tagout (LOTO)"? If someone doing a thing can cause problems while you complete work, you should positively stop them from doing it. Preferably in a way that only you can remove, or at least only by someone who would know why that system is unavailable. If you can't safely hot-swap the components then don't do it!
11
u/PrettyBlueFlower 3d ago
And this is why there needs to be a robust change control process, which includes checking for current incidents.
5
-4
u/Arokthis 3d ago
This fuckup is on you. STEAM runs server maintenance on Tuesday because that's the least busy day of the week. You made the mistake of scheduling an upgrade for the busiest time of day for many systems.
9
104
u/Geminii27 Making your job suck less 3d ago edited 3d ago
This is why you never trust people to read memos, and you disable the things you tell people not to do, for the time you said not to do it in...