We have been asked to deliver a system where the central database server continues its operations, even when applying security updates to the server's operating system or to the database server software. As far as I see it, this includes security updates that require the server(s) to reboot.
Cluster technology seems obvious, but if a server can really reboot while the cluster is in use, I have a couple of questions:
- Which database products can do this?
- How does it work? Does it store the database data on all servers simultaneously, or is one server's tasks transferred to another while it is rebooting?
- How does it affect performance, especially latency of queries?
-
2 Ways you can do this, VMware FT (limited to 1 CPU though), and the other is clustering technology.
VMware FT has 0 latency issues, BUT you are limited to 1 CPU, and the clustering solution often has about a 15 second "failover" time as the TCP session fail over to the new server, and the old TCP sessions timeout, including ARP refreshing on the local network.
sleske : Interesting, didn't know about VMWare Fault Tolerance (FT). Here's an interesting link I found: http://www.virtualization.info/2008/10/vmware-fault-tolerance-overview-and.htmlsleske : Would an OS-level failover (e.g. VMWare FT) even work? The question mentions updates/reboots of the DB software. Wouldn't that require redundancy at the DBMS level? Otherwise you'd have to run both nodes of the cluster in parallel during DBMS upgrades (one for requests, one to run the update on). How would they sync without redundancy at the DBMS level?Bart Silverstrim : I would think you'd set up a cluster of VMWare servers to run your virtual DB server. Take down VMWare node 1 for updates, the second node brings up the VM automatically, so it looks like the virtual server never went down. That would mean shared storage from which to run your database on a SAN or something of the sort, but the "non-clustered" database server wouldn't need to be aware of it since the (virtual) server itself is clustered. Not necessarily cheap, but it would work at the machine level.Bart Silverstrim : I should probably add that my statement was from research on clustering VMWare servers, not specifically VMWare FT. It sounds like (in the comments) that there is a *significant* overhead to running their FT tech, and hardware specs are increased (4 nics? really?)Lars D : How should VMware solve this? Normally I would put the OS on top of VMware, and the database server on top of the OS. So when I need to update the VM with patches, and reboot the VM, how do I avoid that the database transactions are interrupted?From Stephen Thompson -
There are a few ways to go about this. Clusters at the OS level can work, with a brief outage when you move from one node to another. You did not specify your OS platform. Most of the ?NIX platforms have a robust clustering solution.
As far as the DB platform, Oracle has their shared everything approach with RAC where you can bring down a single node and everything will get moved to the other node(s) in the cluster. It allows you to do maintenance on a node while the other nodes keep running and servicing clients. They all utilize the same set of disks. The effect on performance depends upon the hardware sizing, most places size their hardware to N+1 capacity to make sure performance is not effected while doing this type of activity.
Informix has something similar now in their latest release. DB2 is supposed to get this soon.
Lars D : What are the latency implications of having an Oracle cluster, compared to an Oracle with a local disk?Kevin K : Not sure about what you mean by latency. Most Oracle installations these days are SAN based and do not use local disks. The SAN should be in the same data center with multiple high speed paths between the hosts and the SAN. Latency within a data center should not be an issue.Lars D : Latency = time it takes to make one query.Lars D : Many designes introduce latency for queries. For instance, the seek time of a disk is often a significant factor of latency, but network latency also adds to it. Bad SAN structures can cripple a database server, but often SAN only slows down things because the app is optimized towards being in charge of harddisk seeks.Kevin K : I've not heard latency used in this manner, but the implementation is not a one size fits all. The shared everything approach has been very misused. Depending upon your application it can lower latency dramatically, or increase it. There are many configuration options like being abel to point different classes of users to different hosts (long running queries on one, loads on another, regular users to a third). A little more detail on the app would be helpful.From Kevin K -
I believe the only way to do this is using clustering. You will need several DB servers that are combined into a cluster. Then one server can automatically take over for another that has failed. This is referred to as "failover" (or high-availability cluster).
To address your questions:
Which database products can do this?
All that advertise "clustering support". I know at least MySQL and Oracle do, but many other DBMS probably also support it.
How does it work? Does it store the database data on all servers simultaneously, or is one server's tasks transferred to another while it is rebooting?
Both. The servers regularly synchronize their data, so it's kept on all servers. As to which server actually responds to requests, there are two options: In a load-balancing cluster, all servers share the load (so you get better performance), in a high-availability cluster, one computer normally does the work, and the spare takes over if it fails (failover).
How does it affect performance, especially latency of queries?
Sorry, I don't have experience with that. Normally, overhead should be minimal, but failover might take some time and cause timeouts.
From sleske -
MS SQL can cluster across multiple servers - requires shared disk from a different server. MySQL can replicate data with master/slave relationship across multiple nodes. Oracle RAC will create a cluster with multiple nodes. Sybase rep server can replicate data across multiple servers.
And, yes, you could just run everything in VMWare then use FT or Motion to move the OS across nodes running with the data stored in a SAN.
Lars D : * Failover is not an option - it must be continuous operation. * VMware FT does not seem to support that the database server continues its operations while the virtualized (guest) OS is rebooting, am I correct on this? * Can you describe the MSSQL solution in details? I assume that the shared disc can be supplied by a group of servers that can also reboot without taking the disk offline. Please describe the performance implications of such a solution, compared to an MSSQL solution with a local disk.Alex : From what I remember you can't create an MS cluster of any sort without a shared disk. Effectively you're looking at 2 needs - the database and the disk. Honestly, I'm not the best person on this, I would talk to MS directly, but, I know that our MS SQL environment is run with a cluster of multiple nodes and a cluster hostname to virtualise them into one host.From Alex -
No interrupt at all during scheduled maintenance including a restart of the OS? Oracle RAC. It's the only real option I can think of, and certainly the only parallell cluster database I would trust for this. Even RAC must sometimes go down for database patches but most can be applied while running.
If you can handle at least 10-15 seconds downtime, there are a number of other options including clustering at application level (veritas cluster, microsoft cluster, oracle clusterware) or replication at the database level. A virtual infrastrucutre on it's own won't help much. The OS still has to go down.
It is also possible to combine replicated databases with a multihomed client for uninterrupted production, allthough I can't remember the name of any such clients, at the moment anyway.
I might add that you'll probably want to go with some sort of *NIX to keep them reboots to a minimum. As far as I remember there has only been one update worth rebooting for on RHEL and OEL the last couple of years.
Oracle RAC is a parallell cluster. The database is stored on shared storage and accessed by all nodes simultaniously. Done right it should improve overall performance in most cases, and yield little or no difference in query response times. This is complex technology, however, and doing it right is far from trivial.
There are a few other parallell technologies that promise five nines (99,999% uptime, equalling 5 minutes downtime per year) but they are either too old (VAX) or too new (NDB).
From Roy -
I would say one way to do this would be master-master replication using MySQL. Make sure your application is multihomed to use the second master if the first one is unavailable, then you can bring a single master down while the other stays up for both reads and writes. When your second server comes back, just flip it the other direction. Table inserts happen with PK values spaced 2 apart instead of 1 apart, but that is fine, it's just a key.
Brent : While this works 98% of the time, there are still issues where transactions are processed in a different order, or certain commands such as DROP TABLE are not replicated properly.From Zak -
The difference between a reliable system and one with zero downtime is the difference between putting an aluminum balloon into low earth orbit and putting a person on the moon and getting them back again safely.
I would look at the old-school ways of doing this, which in my opinion are the ones that you should be looking at if you need it to work the first time and not blow the budget.
The old standbys are OpenVMS clusters and Tandem (now HP) NonStop. Both of these are designed for several computers running exactly the same DB and same code. Both are designed to provide 100% uptime even through OS and software upgrades and patches. Both have a proven, decades long track record of working properly.
Now -- there are modern things that will provide this, on paper. In practice, you'll run into issues like "oops, we made a mistake in our license server and your VMs now won't boot." In a decade, I'm sure these technologies will be tested and proven to be reliable, but until then, if you need it to work, be very conservative in which stories you believe.
And, lastly, the most important thing in making a system this reliable is to design it well, build it well, and take care of it well because in practice, the least reliable thing in the equation is the person behind the keyboard.
Kyle Hodgson : +1 for Tandem. Though I'm hopeful Remus / VMWare will make this easier and cheaper to do in a less proprietary fashion.Lars D : The human factor is usually the biggest downtime producer, either in operation or development of the app - I'm totally aware of this, but I still want info :-)Massimo : +1 for the wonderful comparison with space travel :-)chris : The reference to Tandem and OpenVMS was a little in jest -- it's unlikely that you'd be able to sell those as solutions to a new problem. My point is mostly that: 1) you may want "carrier grade +" uptimes but please look at the expense -- do you really need it? 2) There are lots of ways of achieving this goal but to really have *proven* 5+ nines of reliability, you need a large dataset. Perfect on paper is much worse than perfect over many decades long deployments. You'll see lots of perfectly valid designs that may or may not have proven implementations.From chris -
MySQL Cluster http://www.mysql.com/products/database/cluster/
- Shared Nothing architecture (central storage is not required).
- Rolling upgrades - update without stopping the cluster.
- You can specify how many copies of your data should exist in the cluster.
- Historically has been an in-memory database, meaning your total database could not exceed the amount of RAM in your cluster (minus overhead for replication).
- Now supports on-disk databases too.
- Doesn't have all the features of some of the other MySQL storage engines.
Roy : I assume this is the new name of NDB. Do you know if indexed data can be stored on disk now? Have they overcome the loading time problem after a crash?elijahbuck : I don't know. I've only used it in-memory.From elijahbuck -
I haven't heard of some of the other mentioned solutions, so I can't compare to them, but since I don't see the one I'm familiar with here, I'll mention it too.
That is MySQL on top of a DRBD filesystem. With linux heartbeat as described here
We used this for a couple of years with no real downtime. Our only issue was that we ran our cluster on virtual machines, and it really needs to be on physical boxes with multiple paths between them (such as ethernet and serial cable, etc)
The way this works is that DRBD is like raid across multiple machines, where it keeps the underlying filesystem in continuous sync between two or more hosts, while heartbeat only allows the filesystem / database to only be live on one server at a time.
Failover when one goes down is very fast - and can be tuned to be even faster if the connections between the machines are redundant and very reliable. (this was our problem using virtual machines). Also, by failing over prior to a scheduled reboot, even this can be minimized.
From Brent -
I look for solutions that can keep the transaction open, even when the machine, on which the database server software is installed (Virtual or physical) is rebooting.
I think you will need to look at HA-JDBC to meet that requirement: http://ha-jdbc.sourceforge.net/
"High-availability/Fault Tolerance - An HA-JDBC database cluster can lose a node without failing/corrupting open transactions."
Cheers
From Jason -
MSSQL with Windows Clustering would handle 0 downtime maintenance windows PROVIDED you fail the node you are going to work on BEFORE you start work. Additionally, you'll need to configure NLB on the hosts to make sure that all connections are handled via a shared IP Address (otherwise, there could be as much as 2 or more seconds of downtime while servers re-try dns, etc). To make clustering work, you'll need a shared storage array, like iSCSI, and 2 or more physical hosts (Hypervisors need upgrades too).
Here's some pretty good information on what that environment would look like, but basically if you cannot have downtime, you will need to have at minimum one MS SQL DBA on-staff and on call to ensure that all failovers happen correctly, and you can't go cheap on ANYTHING. Call Microsoft and go by their book on this, or better yet put your app in the cloud on Azure or a dedicated server vendor who specializes in high availability.
0 comments:
Post a Comment