Show HN: Open source, logical multi-master PostgreSQL replication

(github.com)

111 points | by pgedge_postgres 11 hours ago

9 comments

jwr 12 minutes ago
Bear in mind this no longer provides the same consistency model as PostgreSQL does. It's not a straightforward extension of the nice serializable world. That might not be what you expect given the name, this does not provide a strict serializable consistency model.
See https://jepsen.io/consistency/models for a classification of consistency models.
verelo 10 hours ago
Interesting, i always see attempts to make these types of database tools as super interesting but then I think about all the undocumented edge cases that can come up and they scare me off.
Many many years ago I worked on a monitoring tool that itself needed to be highly available, and we needed a solution like this. Ever since that time I've done everything in my power to avoid it.
What are the real world cases you built this for? And how can someone like me who has been bruised by past experiences get comfortable with it?
[-]
- jwr 18 minutes ago
  > edge cases that can come up and they scare me off
  They should! Read some of the excellent Jepsen analyses to see how scary things can be: https://jepsen.io/analyses
- pgedge_postgres 7 hours ago
  Just a guess, but some of the undocumented edge cases you saw might be explored in this blog from one of our software engineers, Shaun Thomas. It's all about conflict resolution & avoidance in PostgreSQL, in general: https://www.pgedge.com/blog/living-on-the-edge
  If understanding how conflicts are handled in pgEdge is helpful, here's a link to the docs on the subject: https://docs.pgedge.com/spock_ext/conflicts
  And the FAQ also delves into it some: https://www.pgedge.com/resources/faq
- vyruss 16 minutes ago
  Local write latency in a geo-distributed database is also important for some use cases.
- pgedge_postgres 7 hours ago
  Getting some examples of real-world cases to share and will comment back with them ASAP; in the meantime, would you mind sharing what undocumented edge cases you came across and what solutions you explored to handle them? It would help with sharing super relevant use cases :-)
- victor9000 7 hours ago
  What failure cases did you encounter?
- baq 4 hours ago
  Typical use case would be a anyone who has global presence, but serves users in particular geos (think AWS): you want a global user database but it’s soooo convenient to be able to join with regional data in a single query.
OsrsNeedsf2P 8 hours ago
If both nodes approve an update on the same primary key, what happens? I don't see this crucial detail described in the README
[-]
- pgedge_postgres 7 hours ago
  As a note, there's also specific documentation regarding this: https://docs.pgedge.com/spock_ext/conflicts
  And, one of our solutions engineers (Paul Rothrock) has a video released a month ago on this topic as well: https://www.youtube.com/watch?v=prkMkG0SOJE
  Sharing these alongside my other comment in case additional information is helpful :-)
- pgedge_postgres 7 hours ago
  Thanks for pointing out the lack of info on conflict resolution in the README! It's been reported and we'll look at getting that updated ASAP.
  In the meantime, you can find a lot of information in the official FAQ on how conflict resolution is handled (https://www.pgedge.com/resources/faq), but at-a-glance, "pgEdge offers eventual consistency between nodes using a configurable policy (e.g. last-writer-wins) for conflict resolution, along with conflict-free delta apply columns (i.e. CRDTs) for running sum fields. This allows for independent, concurrent and eventually consistent updates across multiple nodes."
  [-]
  - n_u 4 hours ago
    Cool project!
    How do you generate the timestamps for last writer wins? What happens if there is a tie?
    Just my 2c: if I see a distributed database, the first question I ask is how it handles distributed transactions. Perhaps this topic should be higher on your FAQ, currently it is the 21st question.
foreigner 3 hours ago
What are the pros and cons of this compared to CockroachDB?
[-]
- snthpy 3 hours ago
  Not the OP nor knowledgeable in this area but I would suspect / hope postgres compatibility as a start. The last time I looked into whether I could use cockroachdb as a backend for my Airflow cluster, it wasn't possible due to compatibility issues.
  [-]
  - pgedge_postgres 1 hour ago
    You're actually 100% correct! CockroachDB is only 57.25% compatible with standard PostgreSQL (according to https://pgscorecard.com, which details the way it comes up with these numbers) whereas we are 100% compatible (and 100% open-source, whereas they are source-available).
- znpy 1 hour ago
  License. CockroachDB moved to a license that I can’t even remember if it’s source-available anymore.
  [-]
  - pgedge_postgres 50 minutes ago
    It technically is source-available, as of Nov 2024, anyway: https://news.itsfoss.com/cockcroachdb-no-open-source/
    So yes, license (and compatibility - see https://pgscorecard.com) are two major differences between pgEdge and CockroachDB.
    pgEdge version updates also come in very close alignment with upstream PostgreSQL intentionally to make sure security patches/bugfixes and the latest features get to users ASAP.
pisikesipelgas 5 hours ago
Hi, How do You guys resolve the application database DDL issue when multimaster is in use? One node gets updated, DDL is will be replicated (?) to second node, which is used by not-jet-updated application which is not compatible with updated database structure. This problem has bugged me for a while. And second and similar issue with most replication setups is let's take for postgis for example. Again in one node this extension gets updated. Now what? Data will be replicated to node which is not jet updated and cause whole system to be not functional.
[-]
- baq 4 hours ago
  It’s an engineering problem: you have to design the system so that it remains functional in this exact scenario - it follows that the system isn’t just code and build artifacts, but also its deployment processes.
  [-]
  - pisikesipelgas 4 hours ago
    Hi, Thanks for the reply. This is what i figured too. So there is essentially no way to achieve this without service downtime when using application which is not written to handle those kind of situations (eg. 3rd party things).
    [-]
    - baq 3 hours ago
      Again an engineering problem. You can deploy with zero downtime and people have been doing this for decades. It takes infrastructure like load balancers, ability to run versions in parallel and runtime support for feature flags, but it’s absolutely doable and ultimately just another day in the office for anyone with global operations. A lot of 3rd party tools actually support these workflows for this exact reason.
      [-]
      - qaq 2 hours ago
        You gonna pause writes for cutover so while not downtime, specifically for postgres load balancers, ability to run versions in parallel not gonna help you there.
sgarland 9 hours ago
You do not want multi-master. If you think you do, think again.
Source: I have operated a large multi-master Postgres cluster.
[-]
- phs318u 8 hours ago
  Multi-master can be useful in cases where writes to the data are usually logically grouped by an attribute that correlates to the distribution of masters e.g. sales info by geography. The chances of write conflicts become much smaller (though not zero.
- pgedge_postgres 6 hours ago
  There's a lot of ways to approach the common problems found when running multi-master / active-active PostgreSQL. (A complete guide on this, specifically using PostgreSQL in general, was written by one of our solutions engineers, Shaun Thomas: https://www.pgedge.com/blog/living-on-the-edge)
  Could you elaborate on what problems you experienced?
- jasonthorsness 7 hours ago
  Agree; the part of the application requiring multi-master semantics is probably a small piece and can be handled outside the database where there is enough domain-specific knowledge that it can be made simpler and more obvious how conflicts for example are avoided or handled.
- bigwheels 8 hours ago
  I imagined this position would depend almost entirely on the requirements of the project. Are you able to elaborate on why it's a universal "NO" for you?
  [-]
  - gtowey 8 hours ago
    That's just the point, it always sounds like a great idea to people not experienced in database operations.
    The problem with the setup is you will have a data corruption issue at some point. It's not an "if" it's a "when". If you don't have a plan to deal with it, then you're hosed.
    This is why the parent is turning around the burden of proof. If you can't definitely say why you absolutely need this, and no other solution will do, then avoid it.
    [-]
    - bigwheels 8 hours ago
      Believe it or not, Mrs. Bigwheels is pretty experienced in the database department. I've seen multi-master HA architecture work out great for 10M+ DAU games, and many/most other cases where I wouldn't recommend it- as in it wouldn't even enter my brain, because the tradeoffs are harsh.
      IME it comes down to considering CAP against the business goals, and taking into account how much it will annoy the development team(s).
      If you follow "the rules" WRT to writes, it may fit the bill. Especially these days with beauties like RDS. But then again, Aurora is pretty awesome, and did not exist/mature until only ~5 years ago or so.
      Definitely more of a wart than a pancea or silver bullet. Even still, I wouldn't dismiss outright, always keen to compare alternatives.
      Overall it sounds like we're in the same camp, heh.
      [-]
      - porridgeraisin 8 hours ago
        What would you say are the primary tradeoffs?
      - riku_iki 7 hours ago
        > I've seen multi-master HA architecture work out great for 10M+ DAU games
        could you tell what kind of DB was that so we can understand if it is apple to apple comparison to multi-master PG?
throwawaygo 8 hours ago
pgactive?
tonyhart7 9 hours ago
how do they resolve write conflict????
[-]
- pgedge_postgres 7 hours ago
  The official FAQ has a good amount of info on how conflict resolution is handled (https://www.pgedge.com/resources/faq)!
  Relevant excerpt: "pgEdge offers eventual consistency between nodes using a configurable policy (e.g. last-writer-wins) for conflict resolution, along with conflict-free delta apply columns (i.e. CRDTs) for running sum fields. This allows for independent, concurrent and eventually consistent updates across multiple nodes."
  Some specific documentation on the subject: https://docs.pgedge.com/spock_ext/conflicts
  One of our solutions engineers (Paul Rothrock) created a video on this topic in the last month: https://www.youtube.com/watch?v=prkMkG0SOJE
  And if you're interested in more information about conflict management in PostgreSQL clusters in general, this article ("Living on the Edge: Conflict Management and You") from Shaun Thomas is probably useful to check out: https://www.pgedge.com/blog/living-on-the-edge
imglorp 8 hours ago
Third party, multi master postgres is such an old idea, it was done in Perl...
https://github.com/bucardo/bucardo
[-]
- pgedge_postgres 7 hours ago
  We're not claiming to be a new idea, by any means :-)
  Unfortunately, Bucardo is no longer being updated.
  Our goal is simply to support continued innovation of distributed PostgreSQL along with similar tools for enabling high availability / scalability in PG deployments.