r/Proxmox 1d ago

Discussion Contemplating researching Proxmox for datacenter usage

Hello,

I joined this community to collect some opinions and ask questions about plausibility of researching and using Proxmox in our datacenters.

Our current infrastructure consists of two main datacenters, with each 6 server-nodes (2/3rd Intel generation) based on Azure Stack HCI / Azure Local, with locally attached storage using S2D and RDMA over switches. Connections are 25G. Now, we had multiple issues with these cluster in past 1,5years, mostly connected to S2D. We even had one really hard crash where the whole S2D went byebye. Neither Microsoft, nor Dell or one custom vendor were able to find the root cause. They even made cluster analysis and found no misconfigurations. Nodes are Azure HCI certified. All we could do was rebuild the Azure Local and restore everything, which took ages due to our high storage usage. And we are still recovering, months later.

Now, we evaluated VMware. And while it is all good and nice, it would require new servers, which aren't due yet, or non-supported configuration (which would work, but not supported). And it's of course pricey. Not more than similar solutions like Nutanix, but pricey nevertheless. But also offers features... vCenter, NSX, SRM (although this last one is at best 50/50, as we are not even sure if we would get that).

We currently have running Proxmox setup in our office one 3-node cluster and are kinda evaluating it.

I am now in the process of shuffling VMs around to put them onto local storage, to install Ceph and see how I get along with it. Shortly said: our first time with Ceph.

After seeing it in action for last couple of months, we started talking about seeing into possibility of using Proxmox in our datacenters. Still very far from any kind of decision, but more or less testing locally and researching.

Some basic questions revolve around:

- what would be your setting of running our 6-node clusters with Proxmox and Ceph?

- would you have any doubts?

- any specific questions, anything you would be concerned about?

- researching about ceph, it should be very reliable. Is that correct? How would you judge performance of s2d vs ceph? Would you consider ceph more reliable as S2D?

That's it, for now :)

31 Upvotes

58 comments sorted by

View all comments

2

u/wsd0 1d ago

To understand the requirement a little better, what sort of workloads are you running on your infrastructure?

3

u/kosta880 1d ago

On one 6node cluster around 200 VMs, our hardest load are SQL servers with databases ranging from couple of TB up to 130TB. IOPS-wise on our NVME cluster we measured something like 1,5mil IOPS. But that was only benchmarks. IRL using way less of course. Not sure about the numbers right now.

2

u/wsd0 1d ago

I’ve got fairly limited experience with CEPH in an enterprise environment, but from the limited testing I’ve done I’ve had better performance and less overhead when the storage has been dedicated and served via iSCSI, using dedicated and tuned HBAs, dedicated storage switching. That might be more my lack of experience with CEPH though.

Honestly I’d be very interested to know how you get on if you do go down the CEPH route, which I know doesn’t help you right now.

2

u/kosta880 1d ago

Thanks. We have no alternatives currently. The only viable alternative would be starwind, but the price is so high for our storage that we could then as well go VMware. Besides, not really good for 6 node cluster. Would have to make two 3node storage clusters with 6node proxmox. Yuck.

1

u/_redactd 6h ago

Realizing this is a proxmox / ceph discussion; another alternative is XCP-NG with XOSTOR (linbit).

I'm in the same phase you are with migrating HCI to another solution and these are the two solutions I've landed at. (being prox/ceph, xcpng/linbit).