Skip to content
Home » Blog » Extending PostgreSQL with Java: Overcoming Integration Challenges

Extending PostgreSQL with Java: Overcoming Integration Challenges

Why Bridge Java with C in the First Place?

Bridging Java and C combines the strengths of both languages. A C application may rely on Java for modern libraries, cloud APIs, or UI and web capabilities, while a Java app might need C for low-level system access or performance-critical tasks. Sometimes, there’s simply no alternative—certain features only exist in one language. While modern languages like C++ and Go offer both high- and low-level control, many systems aren’t written in them. For existing C or Java codebases, bridging is often the most practical way to extend functionality without a full rewrite.

In my case, the goal was to build a C-based PostgreSQL extension called SynchDB that integrates with the Java-based Debezium Embedded library to enable heterogeneous database replication into PostgreSQL. Debezium already provides mature connectors for databases like MySQL, SQL Server, and Oracle, so rather than reinventing the wheel in C, I chose to bridge the two runtimes using JNI. This approach allows PostgreSQL to consume change data from other systems in real time. However, maintaining both C and Java components within the PostgreSQL extension framework introduces unique challenges—such as cross-language memory management, threading compatibility, signal handling, and debugging across runtime boundaries. Let’s explore some of those next.

SynchDB – A Heterogeneous Database Replication Tool for PostgreSQL

This project serves as a practical case study for the complexities of bridging Java and C, highlighting the technical challenges and design decisions involved in maintaining two runtimes under PostgreSQL’s extension framework. The basic architecture diagram is shown as below where the yellow box represents the Java space (also known as a Java Virtual Machine (JVM)), the blue box represents PostgreSQL extension (C space) and the orange box represents the PostgreSQL core.

The working principles of SynchDB extension starts by instantiating a JVM and running a pre-compiled .jar file that contains the driver class for Debezium embedded engine via Java Native Interface (JNI). It then periodically invokes another Java method to obtain change events, process them and finally apply to PostgreSQL.

Jave Native Interface (JNI) – the Bridge that Needs Caution

JNI is a C-based API provided by the JVM (via libjvm.so) that allows native applications to interface with Java code. In the context of SynchDB, it’s used to create and control a Java Virtual Machine (JVM) from within a PostgreSQL C extension, enabling direct access to Debezium’s Java-based change data capture libraries. While this integration is powerful, it comes with performance trade-offs. JNI calls are significantly slower than native Java method calls, and the overhead becomes more pronounced as call frequency increases. Efficient usage and careful batching are essential to minimize this impact.

JNI – Things to be Aware of

Tricky to Debug

  • Seg fauls from C crashes the whole JVM and vice versa.
  • Mixing stack traces across 2 runtimes could be tricky.
  • Exceptions risen from Java app needs to be explicitly checked in addition to the return code in C.

Resources

  • No garbage collection on C side of JNI.
  • No safety check on C side of JNI (Array bounds, NULL pointers…etc).
  • Requires manual resource management on C side.

Threading

  • Current thread (perhaps other threads) need to be attached to JNI while in use.
  • JVM can be shared among threads.
  • JVM is not sharable across processes.

2 Very Different Memory Models

JVM

  • Maximum heap memory for JVM needs to be set prior to startup (don’t use JVM’s default).
  • Unused memory automatically garbage-collected.
  • Demands more heap memory for:
    • Processing large number of tables.
    • Processing change events faster.
    • Running multiple JVMs in multiple workers.
  • Exceptions occur when maximum heap memory exhausted.

PostgreSQL

  • No need to set a maximum heap memory size prior to startup.
  • Allocated to memory context for automatic lifetime management or explicitly destroyed via pfree().
  • Choosing the right memory context based on the lifetime of a resource becomes important.

When JVM is in the picture, we need to be extra careful in setting the right maximum heap memory for the JVM. This is mostly depending on the use cases, total available memory and the number of JVM workers would be running concurrently.

Signal Handler Conflicts

PostgreSQL background workers heavily rely on Unix signals for internal communication and process control. Signals like SIGUSR1 are used for inter-process notifications such as latch wakeups and logical replication triggers, while SIGTERM, SIGKILL, and SIGQUIT handle lifecycle events like shutdowns or crashes. Additionally, SIGHUP is used for configuration reloads. These signals are central to PostgreSQL’s architecture and must be handled carefully to avoid disrupting its behavior.

Embedding a JVM introduces complications, as the JVM and Java apps often install their own signal handlers. This can interfere with PostgreSQL’s signal handling and cause unexpected behavior. To avoid conflicts, you can either disable JVM signal handling using the -Xrs option, or implement signal chaining—intercepting signals and forwarding them to the original handlers. While chaining offers flexibility, it requires careful implementation.

Batching and Throttle Control

Batching helps mitigate JNI’s performance overhead by grouping multiple change events—such as 1,000, 2,000, or even 16,000 entries—into a single call, reducing the frequency of cross-language invocations. However, while fetching changes is typically fast, processing them can be slower, which may lead to rapid memory buildup in the JVM as unprocessed batches accumulate in queue. To prevent this, a throttle control mechanism is used: when the batch queue becomes full, the connector is temporarily paused until enough batches are processed and memory usage drops to a safe level.

Coordination is Important

Ensuring synchronized coordination between the Java and C components is critical, especially when the Java side runs a standalone service like a Debezium connector. Since the connector operates independently within the JVM, it may encounter exceptions or shut down unexpectedly. In such cases, the C component must be promptly notified to handle the failure gracefully—whether by logging, restarting, or halting replication—so that the system remains stable and consistent.

Concurrency Concerns

The JVM is designed as a single-process runtime, typically started via java -jar or in our case embedded within a PostgreSQL background worker using JNI. It does not support running multiple JVM instances within the same OS process. However, multiple threads within a single process can share the same JVM instance.

Since PostgreSQL is a multi-process system where each backend or background worker runs in its own OS process, each process that needs Java access must initialize its own separate JVM instance to operate independently and in parallel. This can ensure heterogeneous replication be done in parallel with higher performance as long as each worker is responsible for different set of tables. Same table parallelism remains as a tricky topics as more strict transaction boundary and change ordering need to be enforced.

What if we insist on using threads to share one single JVM?

Sure, but be extra cautious as threads can only be used to access Java resource and parse the data in parallel. The threads cannot apply the changes to PostgreSQL in parallel (at least not yet). Things like false stack overflow abort, invalid transaction state, and deadlock could happen if we attempt to do so.

More on SynchDB

  • Open source project started in July 2024.
  • Consists of Java and C components and based on Debezium Embedded Engine v2.6.2 – Great example to discuss C-Java integration challenes
  • SynchDB v1.1 released on April 17, 2025
  • Github repository here. Documentation site here
  • Interested? Help us improve!

P.S. I will also be presenting the same topic in the upcoming pgext.day 2025 in Montreal. If you have not registered this free event, you can do it here