Copyright © 2007-2017 JumpMind, Inc
Version 3.8.44
Permission to use, copy, modify, and distribute this SymmetricDS User Guide for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies.
Preface
This user guide introduces SymmetricDS and its features for data synchronization. It is intended for users, developers, and administrators who want to install the software, configure synchronization, and manage its operation. Thank you to all the members of the open source community whose feedback and contributions helped us build better software and documentation. This version of the guide was generated on 2017-10-06.
1. Introduction
SymmetricDS is open source software for database and file synchronization, with support for multi-master replication, filtered synchronization, and transformation. It uses web and database technologies to replicate change data as a scheduled or near real-time operation, and it includes an initial load feature for full data loads. The software was designed to scale for a large number of nodes, work across low-bandwidth connections, and withstand periods of network outage.
1.1. System Requirements
SymmetricDS is written in Java and requires a Java Runtime Environment (JRE) Standard Edition (SE) or Java Development Kit (JDK) Standard Edition (SE) version 7.0 or above. Most major operating systems and databases are supported. See the list of supported databases in the Database Compatibility section. The minimum operating system requirements are:
-
Java SE Runtime Environment 7 or above
-
Memory - 64 (MB) available
-
Disk - 256 (MB) available
The memory, disk, and CPU requirements increase with the number of connected clients and the amount of data being synchronized. The best way to size a server is to simulate synchronization in a lower environment and benchmark data loading. However, a rule of thumb for servers is one server-class CPU with 2 GB of memory for every 500 MB/hour of data transfer and 350 clients. Multiple servers can be used as a cluster behind a load balancer to achieve better performance and availability.
SymmetricDS Pro is accessed from a web console, which requires one of the following supported web browsers:
-
Google Chrome 23 or newer
-
Internet Explorer 8 or newer
-
Mozilla Firefox 17 or newer
-
Safari 6 or newer
1.2. Overview
A node is responsible for synchronizing the data from a database or file system with other nodes in the network using HTTP. Nodes are assigned to one of the node Groups that are configured together as a unit. The node groups are linked together with Group Links to define either a push or pull communication. A pull causes one node to connect with other nodes and request changes that are waiting, while a push causes one node to connect with other nodes when it has changes to send.
Each node is connected to a database with a Java Database Connectivity (JDBC) driver using a connection URL, username, and password. While nodes can be separated across wide area networks, the database a node is connected to should be located nearby on a local area network for the best performance. Using its database connection, a node creates tables as a Data Model for configuration settings and runtime operations. The user populates configuration tables to define the synchronization and the runtime tables capture changes and track activity. The tables to sync can be located in any Catalog and Schema that are accessible from the connection, while the files to sync can be located in any directory that is accessible on the local server.
At startup, SymmetricDS looks for Node Properties Files and starts a node for each file it finds, which allows multiple nodes to run in the same instance and share resources. The property file for a node contains its external ID, node group, registration server URL, and database connection information. The external ID is the name for a node used to identify it from other nodes. One node is configured as the registration server where the master configuration is stored. When a node is started for the first time, it contacts the registration server using a registration process that sends its external ID and node group. In response, the node receives its configuration and a node password that must be sent as authentication during synchronization with other nodes.
1.3. Architecture
Each subsystem in the node is responsible for part of the data movement and is controlled through configuration. Data flows through the system in the following steps:
-
Capture into a runtime table at the source database
-
Route for delivery to target nodes and group into batches
-
Extract and transform into the rows, columns, and values needed for the outgoing batch
-
Send the outgoing batch to target nodes
-
Receive the incoming batch at the target node
-
Transform into the rows, columns, and values needed for the incoming batch
-
Load data and return an acknowledgment to the source node
- Capture
-
Change Data Capture (CDC) for tables uses database triggers that fire and record changes as comma-separated values into a runtime table called DATA. For file sync, a similar mechanism is used, except changes to the metadata about files are captured. The changes are recorded as insert, update, and delete event types. The subsystem installs and maintains triggers on tables based on the configuration provided by the user, and it can automatically detect schema changes on tables and regenerate triggers.
- Route
-
Routers run across new changes to determine which target nodes will receive the data. The user configures which routers to use and what criteria is used to match data, creating subsets of rows if needed. Changes are grouped into batches and assigned to target nodes in the DATA_EVENT and OUTGOING_BATCH tables.
- Extract
-
Changes are extracted from the runtime tables and prepared to be sent as an outgoing batch. If large objects are configured for streaming instead of capture, they are queried from the table. Special event types like "reload" for Initial Loads are also processed.
- Transform
-
If transformations are configured, they operate on the change data either during the extract phase at the source node or the load phase at the target node. The node’s database can be queried to enhance the data. Data is transformed into the tables, rows, columns, and values needed for either the outgoing or incoming batch.
- Outgoing
-
The synchronization sends batches to target nodes to be loaded. Multiple batches can be configured to send during a single synchronization. The status of the batch is updated on the OUTGOING_BATCH table as it processes. An acknowledgment is received from target nodes and recorded on the batch.
- Incoming
-
The synchronization receives batches from remote nodes and the data is loaded. The status of the batch is updated on the INCOMING_BATCH table as it processes. The resulting status of the batch is returned to the source node in an acknowledgment.
1.4. Features
SymmetricDS offers a rich set of features with flexible configuration for large scale deployment in a mixed environment with multiple systems.
-
Web UI - The web console provides easy configuration, management, and troubleshooting.
-
Data Synchronization - Change data capture for relational databases and file synchronization for file systems can be periodic or near real-time, with an initial load feature to fully populate a node.
-
Central Management - Configure, monitor, and troubleshoot synchronization from a central location where conflicts and errors can be investigated and resolved.
-
Automatic Recovery - Data delivery is durable and low maintenance, withstanding periods of downtime and automatically recovering from a network outage.
-
Secure and Efficient - Communication uses a data protocol designed for low bandwidth networks and streamed over HTTPS for encrypted transfer.
-
Transformation - Manipulate data at multiple points to filter, subset, translate, merge, and enrich the data.
-
Conflict Management - Enforce consistency of two-way synchronization by configuring rules for automatic and manual resolution.
-
Extendable - Scripts and Java code can be configured to handle events, transform data, and create customized behavior.
-
Deployment Options - The software can be installed as a self-contained server that stands alone, deployed to a web application server, or embedded within an application.
1.5. Why SymmetricDS?
SymmetricDS is a feature-rich data synchronization solution that focuses on ease of use, openness, and flexibility. The software encourages interoperability and accessibility for users and developers with the availability of source code, an application programming interface (API), and a data model supported by documentation. Configuration includes a powerful set of options to define node topology, communication direction, transformation of data, and integration with external systems. Through scripts and Java code, the user can also extend functionality with custom behavior. With a central database for setup and runtime information, the user has one place to configure, manage, and troubleshoot synchronization, with changes taking immediate effect across the network.
The trigger-based data capture system is easy to understand and widely supported by database systems. Table synchronization can be setup by users and application developers without requiring a database administrator to modify the server. Triggers are database objects written in a procedural language, so they are open for examination, and include flexible configuration options for conditions and customization. Some overhead is associated with triggers, but they perform well for applications of online transaction processing, and their benefits of flexibility and maintenance outweigh the cost for most scenarios.
Using an architecture based on web server technology, many simultaneous requests can be handled at a central server, with proven deployments in production supporting more than ten thousand client nodes. Large networks of nodes can be grouped into tiers for more control and efficiency, with each group synchronizing data to the next tier. Data loading is durable and reliable by tracking batches in transactions and retrying of faults for automatic recovery, making it a low maintenance system.
1.6. License
SymmetricDS Pro is commercial software that is licensed, not sold. It is subject to the terms of the End User License Agreement (EULA) and any accompanying JumpMind Support Contract. See the standard SymmetricDS Pro license for reference, but your agreement with JumpMind may be different.
2. Installation
SymmetricDS at its core is a web application. A SymmetricDS instance runs within the context of a web application container like Jetty or Tomcat, and uses web based protocols like HTTP to communicate with other instances.
An instance has one of the following installation options:
-
Standalone Installation - SymmetricDS is installed and run as a standalone process using the built-in Jetty web server. This is the simplest and recommended way to install an instance.
-
Web Archive (WAR) - A SymmetricDS web archive (WAR) file is deployed to an existing web application container that is separately installed, maintained and run.
-
Embedded - SymmetricDS is embedded within an existing application. In this option, a custom wrapper program is written that calls the SymmetricDS API to synchronize data.
2.1. Standalone Installation
The SymmetricDS installer is an executable jar file named symmetric-pro-<version>-setup.jar. In order to run the installer, you must have the Java Runtime Environment (JRE) version 6.0 or newer installed. Start the installer by double-clicking it (if the JRE is in your path and associated with .jar files), or by running it from a command prompt, like this:
java -jar symmetric-pro-<version>-setup.jar
The default installation will run in graphical mode, but it can also be run from a command window by
adding the -console argument on the end of the command.
The first screen is a Welcome screen that includes the SymmetricDS Pro version number. The installer will ask a series of questions before writing files to disk.
To begin selecting installation options, click Next.
Specify whether you want to install a new version of SymmetricDS for the first time (Install new software) or upgrade an existing version of SymmetricDS that was previously installed (Upgrade existing software). For upgrade, the existing installation of SymmetricDS or SymmetricDS Pro is verified before continuing.
Select the appropriate option and click Next.
Carefully read the SymmetricDS Pro License Agreement.
If you accept, select I accept the terms of this license agreement and click Next.
Choose the installation path where SymmetricDS will either be installed or upgraded. If the directory does not already exist, it will be created for you. Make sure your user has permission to write to the file system.
After entering the directory path, click Next.
Select the packages you want to install and verify disk space requirements are met. By default, all packages are selected. If you are NOT integrating SymmetricDS with Android, you can unselect the Android package.
After selecting packages, click Next.
A standalone installation can either be run automatically by the system or manually by the user. Select the Install service to run automatically checkbox to install a Windows service or Unix daemon that will start SymmetricDS when the computer is restarted. The service can installed or uninstalled later using the Control Center or command line (see Running as a Service).
Select the Run server after installing checkbox to also run SymmetricDS after installation so it can be used immediately.
After selecting options, click Next.
| We do not currently support the automatic run on Mac Operating Systems. |
For standard synchronization and web console access over HTTP, select the Enable HTTP checkbox. For encrypted synchronization and web console access over HTTPS, select the Enabled SSL checkbox.
The Java Management eXtensions (JMX) are a set of server properties and operations that can be used to manage the server.
To enable a simple web console for JMX, select the Enable JMX checkbox.
To enable remote access for JMX clients like JConsole and bin/jmx, select the Enable JMX Agent checkbox.
After selecting options and specifying unused ports, click Next.
Confirm your installation settings and click Next to begin the installation.
After SymmetricDS finishes installing, click Next.
If you chose the option for the server to start after installation, wait for it to start and then click Next.
The installation is now complete. Choose if you want to open the SymmetricDS Pro Control Center where you can view the server status and open a web console.
Click Done to exit the installer.
From the SymmetricDS Pro Control Center, you can start/stop the server, open the web console, and install/uninstall the service.
To begin configuration of SymmetricDS, check that the server is running, and then click Open Web Console.
To continue setup and configuration of SymmetricDS, refer to the Setup section.
2.2. Running as a Service
SymmetricDS can be configured to start automatically when the system boots, running as a Windows service or Linux/Unix daemon.
A wrapper process starts SymmetricDS and monitors it, so it can be restarted if it runs out of memory or exits unexpectedly.
The wrapper writes standard output and standard error to the logs/wrapper.log file.
For SymmetricDS Pro, you may have already installed as a service, so this section will show you how to manually install the service from command line.
2.2.1. Running as a Windows Service
To install the service, run the following command as Administrator:
bin\sym_service.bat install
Most configuration changes do not require the service to be re-installed. To uninstall the service, run the following command as Administrator:
bin\sym_service.bat uninstall
To start and stop the service manually, run the following commands as Administrator:
bin\sym_service.bat start
bin\sym_service.bat stop
2.2.2. Running as a Linux/Unix daemon
An init script is written to the system /etc/init.d directory.
Symbolic links are created for starting on run levels 2, 3, and 5 and stopping on run levels 0, 1, and 6.
To install the script, running the following command as root:
bin/sym_service install
Most configuration changes do not require the service to be re-installed. To uninstall the service, run the following command as root:
bin/sym_service uninstall
To start and stop the service manually, run the following commands:
bin/sym_service start
bin/sym_service stop
2.3. Clustering
A single SymmetricDS node may be deployed across a series of servers to cooperate as a cluster. A node can be clustered to provide load balancing and high availability.
When using clustering, a hardware load balancer is typically used, but a software load balancer, such as a reverse proxy, can also be used.
For clustered nodes running SymmetricDS 3.8 and later, the recommended approach is to configure the load balancer to use sticky sessions and ensure the staging directory for all nodes in the cluster are using a shared network drive.
Sticky sessions are needed to support reservation requests, which allows for nodes to connect and obtain a reservation before connecting again and pushing their changes. The shared staging directory is needed to support extract in background of the initial load, which is extracted by one node, but served by different nodes in the cluster. If the start.initial.load.extract.job property is disabled, then shared staging is not required, but the performance of the initial load may be degraded.
For clustered nodes running SymmetricDS 3.7 and earlier, it is recommended to round robin client requests to the cluster and configure the load balancer for stateless connections.
Also, the sync.url (discussed in Registration URL) SymmetricDS property should be set to the URL of the load balancer.
If the cluster will be running any of the SymmetricDS jobs, then the cluster.lock.enabled property should be set to true.
By setting this property to true, SymmetricDS will use a row in the LOCK table as a semaphore to make sure that only one instance at a time
runs a job. When a lock is acquired, a row is updated in the lock table with the time of the lock and the server id of the locking job. The lock time is set back to null
when the job is finished running. Another instance of SymmetricDS cannot acquire a lock until the locking instance (according to the server id) releases the lock. If an
instance is terminated while the lock is still held, an instance with the same server id is allowed to reacquire the lock. If the locking instance remains down, the lock can be
broken after a period of time, specified by the cluster.lock.timeout.ms property, has expired. Note that if the job is still running and the lock
expires, two jobs could be running at the same time which could cause database deadlocks.
By default, the locking server id is the hostname of the server. If two clustered instances are running on the same server, then the cluster.server.id property
may be set to indicate the name that the instance should use for its server id.
When deploying SymmetricDS to an application server like Tomcat or JBoss, no special session clustering needs to be configured for the application server.
2.4. Other Deployment Options
It is recommended that SymmetricDS is installed as a standalone service, however there are two other deployment options.
2.4.1. Web Archive (WAR)
This option means packaging a WAR file and deploying to your favorite
web server, like Apache Tomcat. It’s a little more work, but you
can configure the web server to do whatever you need. SymmetricDS can also
be embedded in an existing web application, if desired. As a web application archive, a WAR is deployed to an application server,
such as Tomcat, Jetty, or JBoss. The structure of the archive will have a web.xml
file in the WEB-INF folder, an appropriately configured symmetric.properties file in the WEB-INF/classes folder,
and the required JAR files in the WEB-INF/lib folder.
A war file can be generated using the standalone installation’s symadmin utility and the
create-war subcommand. The command requires the name of the war file to generate. It
essentially packages up the web directory, the conf directory and includes an optional
properties file. Note that if a properties file is included, it will be copied to
WEB-INF/classes/symmetric.properties. This is the same location conf/symmetric.properties
would have been copied to. The generated war distribution uses the same web.xml as the standalone
deployment.
bin/symadmin -p my-symmetric-ds.properties create-war /some/path/to/symmetric-ds.war
2.4.2. Embedded
This option means you must write a wrapper Java program that runs SymmetricDS. You would probably use Jetty web server, which is also embeddable. You could bring up an embedded database like Derby or H2. You could configure the web server, database, or SymmetricDS to do whatever you needed, but it’s also the most work of the three options discussed thus far.
The deployment model you choose depends on how much flexibility you need versus how easy you want it to be. Both Jetty and Tomcat are excellent, scalable web servers that compete with each other and have great performance. Most people choose either the Standalone or Web Archive with Tomcat 5.5 or 6. Deploying to Tomcat is a good middle-of-the-road decision that requires a little more work for more flexibility.
A Java application with the SymmetricDS Java Archive (JAR) library on its
classpath can use the SymmetricWebServer to start the server.
import org.jumpmind.symmetric.SymmetricWebServer;
public class StartSymmetricEngine {
public static void main(String[] args) throws Exception {
SymmetricWebServer node = new SymmetricWebServer(
"classpath://my-application.properties", "conf/web_dir");
// this will create the database, sync triggers, start jobs running
node.start(8080);
// this will stop the node
node.stop();
}
This example starts the SymmetricDS server on port 8080.
The configuration properties file, my-application.properties,
is packaged in the application to provide properties that override the SymmetricDS
default values. The second parameter to the constructor points to the web directory.
The default location is web. In this example the web directory is located
at conf/web_dir. The web.xml is expected to be found at conf/web_dir/WEB-INF/web.xml.
3. Setup
Once the SymmetricDS software is installed on a computer and an instance of it is running, the next step in setting up the synchronization scenario is to set up a SymmetricDS node within that running SymmetricDS instance. As a reminder, a SymmetricDS node is connected to a database or file system and is responsible for synchronizing that database’s data to other SymmetricDS nodes within the node network.
3.1. Node Type
When opening the SymmetricDS Pro web console, if there are no SymmetricDS nodes defined within the running SymmetricDS instance, the Node Setup Wizard will be displayed to guide you through the process of creating one. There are two types of SymmetricDS nodes:
-
Master Node - The master node is typically the first node created when defining a synchronization scenario and serves as the central configuration point for the synchronization scenario. All SymmetricDS configuration is stored in the database this node is attached to. This node is also typically the registration server for the synchronization scenario, accepting registration requests from other nodes and providing them the SymmetricDS configuration they need.
-
Node - All other nodes are simply designated nodes. They have all of the same responsibilities as a Master Node as far as being responsible for synchronizing a particular database or file system’s data, but have NO responsibilities for configuration or registration.
In addition to the two standard node types above, there is also an option to create a Demo installation. This Demo installation does the following:
-
Creates two fully populated H2 example databases (a server and a client) to be synchronized
-
Creates a SymmetricDS Master node within the running SymmetricDS instance attached to the server database
-
Creates a second Node within the running SymmetricDS instance attached to the client database
-
Configures SymmetricDS to synchronize the two sample databases
-
Randomly executes insert, update and delete statements on the database to show database activity
The Demo option is a great way to look at an existing synchronization scenario configuration and setup. If your goal is to begin synchronizing your own databases, select the Master or Node options from above.
Select the type of node to be created within this running SymmetricDS instance.
3.2. Master Node Setup
3.2.1. Connect to the database
The first step in setting up a master node is to connect it to the database it will be responsible for synchronizing. This database will also be used to store the SymmetricDS configuration.
-
Specify the database type
-
Specify the jdbc url. An example will be provided based on the database type selected. Be sure to replace host name, port and database name if necessary (varies by database type)
-
Specify the id to be used to login to the database
-
Specify the password to be used to login to the database
3.2.2. Select a profile
SymmetricDS has several predefined profiles that can assist in creating configuration for a synchronization scenario. These profiles create a base configuration and allow you to tweak that configuration after initial creation. The following profiles are provided:
| Standard 2 Tier |
This pattern is a central database being synchronized with many child databases of the same type. Typical examples include a central office database synchronizing to field databases (central office to stores, central office to field offices, etc.) |
| Master to Master |
Master to Master synchronization. Typically two master databases / nodes used in failover fashion |
| Multiple Source to One Target |
Many source databases synchronizing to a target database. A typical example for this pattern is operational databases to a data warehouse or data mart. |
The other options aren’t profiles, but options for configuration
- I’ll configure things myself
-
This option allows you to configure everything from scratch
- Import Existing
-
Allows you to import an existing SymmetricDS configuration file for this node
Select the profile or option and click Next.
3.2.3. Communication Settings
Provide the URL that other SymmetricDS nodes will use to connect to this master node. The default option provided should be used unless the master node will be accessed through a proxy, load balancer, or different host name than shown in the default option.
You can also select HTTPS or HTTP to be used for transferring data.
3.2.4. Console User
By default SymmetricDS uses an internal security scheme for protecting the SymmetricDS application. The admin user is the administrative user for the SymmetricDS web console. Provide a password for this user and click Next.
Additional users can be setup in the web console after the wizard completes. For more information, see Users.
SymmetricDS can also use LDAP to secure the SymmetricDS application. For more information, see LDAP.
3.2.5. Summary
The node is now ready to be installed. Click the Finish button.
3.3. Quick Config Wizard
Once the SymmetricDS software is installed, an instance of it is running, and a master node is setup, the synchronization scenario must be configured (i.e. what tables and/or files do I want to synchronize from which source node groups to which target node groups). The Quick Config wizard simplifies this process by walking you through several simple configuration steps.
The Quick Config Wizard can be found under the Configure→Overview screen.
The quick config wizard will show a screen for each router that is in place. The below example
shows the profile Standard 2 Tier with a client and server.
|
3.3.1. Router: Client to Server
Select all tables that should be synchronized in the direction
stated in the header, in this case from CLIENT to the SERVER
3.3.2. Router: Server to Client
Next we will select tables to sync from the SERVER to the CLIENT.
The item_selling_price table was selected on both wizard screens so this table
will be configured for bi-directional sync.
|
3.3.3. Configuration Summary
The summary of the configuration made by the wizard.
| Once the quick config wizard has completed your setup you can still navigate to the specific components created to provide customizations beyond the default settings. |
3.4. Node Setup
Once configuration is complete SymmetricDS will automatically pop up a Node Setup Wizard if there is only one node currently setup in your network.
See Add Node for more information.
3.5. Node Properties File
The node setup wizard generates a properties file that contains all the information a node needs to bootstrap itself.
Each node that is deployed to a server is represented by a properties file that allows it to connect to a database and
register with a parent node. Properties are configured in a file named
xxxxx.properties. It is placed in the engines directory of the SymmetricDS install. The
file is usually named according to the engine.name, but it is not a
requirement.
To give a node its identity, the following properties are required. Any
other properties found in
conf/symmetric.properties
can be overridden for a specific engine in an engine’s properties file.
If the properties are changed in
conf/symmetric.properties
they will take effect across all engines deployed to the server.
You can use the variable $(hostName) to represent the host name of the machine when defining these properties
(for example, external.id=$(hostName)). You can also access external id, engine name, node group id, sync URL, and registration URL in this manner.
(for example, engine.name=$(nodeGroupId)-$(externalId)).
|
| You can also use BSH script for the external id, engine name, node group id, sync URL, and registration URL. Use back ticks to indicate the BSH expression, and note that only one BSH expression is supporter for a given property line. The script can be prefixed or suffixed with fixed text. For example, if you wish to based the external id off of just a part of the hostname (e.g., substring of hostName): external.id=store-`import org.apache.commons.lang.StringUtils; return StringUtils.substring(hostName,2,4);\` |
- engine.name
-
This is an arbitrary name that is used to access a specific engine using an HTTP URL. Each node configured in the engines directory must have a unique engine name. The engine name is also used for the domain name of registered JMX beans.
- group.id
-
The node group that this node is a member of. Synchronization is specified between node groups, which means you only need to specify it once for multiple nodes in the same group.
- external.id
-
The external id for this node has meaning to the user and provides integration into the system where it is deployed. For example, it might be a retail store number or a region number. The external id can be used in expressions for conditional and subset data synchronization. Behind the scenes, each node has a unique sequence number for tracking synchronization events. That makes it possible to assign the same external id to multiple nodes, if desired.
- sync.url
-
The URL where this node can be contacted for synchronization. At startup and during each heartbeat, the node updates its entry in the database with this URL. The sync url is of the format:
http://{hostname}:{port}/{webcontext}/sync/{engine.name}The {webcontext} is blank for a standalone deployment. It will typically be the name of the war file for an application server deployment.
The {engine.name} can be left blank if there is only one engine deployed in a SymmetricDS server.
When a new node is first started, it is has no information about synchronizing. It contacts the registration server in order to join the network and receive its configuration. The configuration for all nodes is stored on the registration server, and the URL must be specified in the following property:
- registration.url
-
The URL where this node can connect for registration to receive its configuration. The registration server is part of SymmetricDS and is enabled as part of the deployment. This is typically equal to the value of the sync.url of the registration server.
Note that a registration server node is defined as one whose registration.urlis either blank or identical to itssync.url.For a deployment where the database connection pool should be created using a JDBC driver, set the following properties:
- db.driver
-
The class name of the JDBC driver.
- db.url
-
The JDBC URL used to connect to the database.
- db.user
-
The database username, which is used to login, create, and update SymmetricDS tables.
- db.password
-
The password for the database user.
See Startup Parameters, for additional parameters that can be specified in the engine properties file.
4. Configuration
Configuring SymmetricDS is the process of setting up your synchronization scenario.
The Configure Overview screen provides a summary of what is currently configured in this master node. The main table on this screen shows configuration elements (Groups, Group Links, Routers, etc.) and the number of items configured for each element. As an example, in the sample screenshot below, there are 2 Groups defined, 2 Group Links defined, 2 Routers defined, 9 Channels, defined, etc. Certain configuration elements are required to configure the synchronization scenario. If these elements are not configured, a warning symbol will be displayed next to the element name.
In addition to using the Quick Config Wizard to configure the synchronization solution, you can also configure SymmetricDS configuration elements one by one using the links on the left hand side of the Configure Overview screen. Each configuration element is described in the following sections of this guide.
Configuration can also be imported from and exported to external files by pressing the Import and Export buttons from the Configure Overview screen.
4.1. Import
There are two ways a SymmetricDS configuration can be imported. Both ways support importing an export that is in CSV format. In order to import the SQL format you would run the SQL via database specific tools.
-
During the creation of the master node, while selecting a profile
Figure 3. Import Existing configuration during the node creation -
Import a configuration through the web console
This will wipe out all existing configuration settings and replace with the newly imported ones.
4.2. Export
Export your configuration at any time through the web console. Use the drop down to select the format you would like to use for the export.
Exporting your configuration might be useful in order to save a configuration in source control or to be used as an import into another environment. Export and Import are frequently used to migrate configuration between test and production environments.
4.3. Groups
In SymmetricDS, configuration rules are applied to groups of nodes versus individual nodes. A group is a categorization of nodes with similar synchronization needs. For example, in a synchronization scenario where a corporate office database is synchronized with field office databases, two node groups would be created, one for the corporate office database (Corporate), and one for the field office databases (Field_office). In the corporate group, there would be a single node and database. In the field_office group, there would be many nodes and databases, one for each field office. Configuration rules/elements are applied to the node group versus the individual nodes in order to simplify the configuration setup (no need to configure each individual field office node, just how the field office nodes sync with the corporate office node).
- Group ID
-
Unique identifier for the group.
- Description
-
Description of the group that is available through the console.
Basic Two Tier |
Server, Client |
Retail |
Corp, Store, Register,Handheld |
Field Office |
Corporate, Field_office |
Failover |
Master, Failover |
4.4. Group Links
Group links define at a high level how data moves throughout your synchronization scenario. The group link defines which node groups will synchronize data to other node groups and within that exchange, which node group will initiate the conversation for that exchange.
- Source Group ID
-
The source group of the communication link.
- Link
-
Defines how the source and target groups will communicate.
Push [P] |
Indicates that nodes in the source node group will initiate communication over an HTTP PUT and push data to nodes in the target node group. |
Wait for Pull [W] |
Indicates nodes in the source node group will wait for a node in the target node group to connect via an HTTP GET and allow the nodes in the target node group to pull data from the nodes in the source node group. |
Route-only [R] |
Route-only indicates that the data isn’t exchanged between nodes in the source and nodes in the target node groups via SymmetricDS. This action type might be useful when using an XML publishing router or an audit table changes router. |
- Target Group ID
-
The target group of the communication link.
- Sync Configuration
-
Determines if configuration is also sent through this group link. By default this is checked and configuration will communicate on this path. There are configurations that might cause configuration to continuously loop through the network as a result this might need to be unchecked for some links.
- Reversible
-
Allows the communication link to send in the reverse direction if specified on the channel. A push link can be overridden to pull and a pull link can be overridden to push using a setting on the channel.
| Basic |
Common two tier corp and store store bi-directional setup |
| Firewall |
All communication is initiated from corp group. This is a common way to setup synchronization when the corp group is behind a firewall. |
| Peer to Peer |
Allows a node to sync to all other nodes in the same group. |
| Loop |
Turn sync configuration off on group link c to a so that configuration does not loop. |
|
Group Links Tips and Tricks
The link also defines if configuration data will be synchronized on the link. For example, you might not want remote nodes to be able to change configuration and effect other nodes in the network. In this case you would uncheck sync configuration.
|
| Turn off sync configuration option where appropriate in non hierarchical setup so that configuration changes to not create a loop. See "No Config" example above. |
4.5. Routers
Routers ride on top of group links. While a group link specifies that data should be moved from nodes in a source node group to nodes in a target node group, routers define more specifically which captured data from a source node should be sent to which specific nodes in a target node group, all within the context of the node group link.
- Router Id
-
Unique description of a specific router
- Group Link
-
The group link used for the source and target node groups of this router
- Router Type
-
The type of router. Standard router types are listed below. Custom routers can be configured as extension points.
| Type | Provided | Description |
|---|---|---|
default |
x |
A router that sends all captured data to all nodes that belong to the target node group defined in the router. See Default Router |
column |
x |
A router that compares old or new column values in a captured data row to a constant value or the value of a target node’s external id or node id. See Column Match Router |
audit |
x |
A router that inserts into an automatically created audit table. It records captured changes to tables that it is linked to. See Audit Table Router |
java |
x |
A router that executes a Java expression in order to select nodes to route to. The script can use the old and new column values. See [Java Router] |
lookuptable |
x |
A router which can be configured to determine routing based on an existing or ancillary table specifically for the purpose of routing data. See Lookup Table Router |
subselect |
x |
A router that executes a SQL expression against the database to select nodes to route to. This SQL expression can be passed values of old and new column values. See Subselect Router |
bsh |
x |
A router that executes a Bean Shell script expression in order to select nodes to route to. The script can use the old and new column values. See Beanshell Router |
- Router Expression
-
An expression that is specific to the type of router that is configured in router type. See the documentation for each router for more details.
- Use Source Catalog/Schema
-
If set then the source catalog and source schema are sent to the target to be used to find the target table.
- Target Catalog
-
Optional name of catalog where a target table is located. If this field is unspecified, the catalog will be either the default catalog at the target node or the "source catalog name" from the table trigger, depending on how "use source catalog schema" is set for the router. Variables are substituted for
$(sourceNodeId),$(sourceExternalId),$(sourceNodeGroupId),$(targetNodeId),$(targetExternalId),$(targetNodeGroupId), and$(none). - Target Schema
-
Optional name of schema where a target table is located. If this field is unspecified, the schema will be either the default schema at the target node or the "source schema name" from the table trigger, depending on how "use source catalog schema" is set for the router. Variables are substituted for
$(sourceNodeId),$(sourceExternalId),$(sourceNodeGroupId),$(targetNodeId),$(targetExternalId),$(targetNodeGroupId), and$(none).
- Sync on Update
-
Flag that indicates that this router should send updated rows from nodes in the source node group to nodes in the target node group.
- Sync on Insert
-
Flag that indicates that this router should send inserted rows from nodes in the source node group to nodes in the target node group.
- Sync on Delete
-
Flag that indicates that this router should send deleted rows from nodes in the source node group to nodes in the target node group.
- Target Table
-
Optional name for a target table. Only use this if the target table name is different than the source.
-
A router that is not linked to any triggers.
-
A router that is linked to a single trigger.
-
A router that is linked to multiple triggers.
4.5.1. Router Types
Default Router
The simplest router is a router that sends all the data that is captured by its associated triggers to all the nodes that belong to the target node group defined in the router.
Column Match Router
Sometimes requirements may exist that require data to be routed based on the current value or the old value of a column in the table that is being routed.
The first part of the expression is always the column name. The column name should always be defined in upper case. The upper case column name prefixed by OLD_ can be used for a comparison being done with the old column data value.
The second part of the expression can be a constant value, a token that represents another column, or a token that represents some other SymmetricDS concept. Token values always begin with a colon (:).
-
Consider a table that needs to be routed to all nodes in the target group only when a status column is set to 'READY TO SEND.'
STATUS=READY TO SEND
-
Consider a table that needs to be routed to all nodes in the target group only when a status column changes values.
| The use of OLD_STATUS, where the OLD_ prefix gives access to the old column value. |
STATUS!=:OLD_STATUS
|
Attributes on a NODE that can be referenced with the following tokens
|
-
Consider a table that needs to be routed to only nodes in the target group whose STORE_ID column matches the external id of a node.
STORE_ID=:EXTERNAL_ID
-
Consider a table that needs to be routed to a redirect node defined by its external id in the REGISTRATION_REDIRECT table.
STORE_ID=:REDIRECT_NODE
-
More than one column may be configured in a router_expression. When more than one column is configured, all matches are added to the list of nodes to route to. The following is an example where the STORE_ID column may contain the STORE_ID to route to or the constant of ALL which indicates that all nodes should receive the update.
STORE_ID=ALL or STORE_ID=:EXTERNAL_ID
-
The NULL keyword may be used to check if a column is null. If the column is null, then data will be routed to all nodes who qualify for the update. This following is an example where the STORE_ID column is used to route to a set of nodes who have a STORE_ID equal to their EXTERNAL_ID, or to all nodes if the STORE_ID is null.
STORE_ID=NULL or STORE_ID=:EXTERNAL_ID
-
External data collected as part of the trigger firing (see External Select) can also be used as a virtual column in the router expression as well.
EXTERNAL_DATA=:EXTERNAL_ID
Audit Table Router
This router audits captured data by recording the change in an audit table that the router creates and keeps up to date. The router creates a table named the same as the table for which data was captured with the suffix of _AUDIT. It will contain all of the same columns as the original table with the same data types only each column is nullable with no default values.
| The following parameter must be set to true so that the audit table can be created. |
auto.config.database=true
| AUDIT_ID |
the primary key of the table. |
| AUDIT_TIME |
the time at which the change occurred. |
| AUDIT_EVENT |
the DML type that happened to the row. |
| The audit router must be associated with a node group link of type 'R'. The 'R' stands for 'only routes to' (see Group Links). |
Lookup Table Router
A lookup table may contain the id of the node where data needs to be routed. This could be an existing table or an ancillary table that is added specifically for the purpose of routing data.
|
Each of the following configuration parameters are required.
|
| The lookup table will be read into memory and cached for the duration of a routing pass for a single channel. |
Consider a table that needs to be routed to a specific store, but the data in the changing table only contains brand information. In this case, the STORE table may be used as a lookup table.
LOOKUP_TABLE=STORE KEY_COLUMN=BRAND_ID LOOKUP_KEY_COLUMN=BRAND_ID EXTERNAL_ID_COLUMN=STORE_ID
Subselect Router
Sometimes routing decisions need to be made based on data that is not in the current row being synchronized. A 'subselect' router can be used in these cases. A 'subselect' is configured with a router expression that is a SQL select statement which returns a result set of the node ids that need routed to. Column tokens can be used in the SQL expression and will be replaced with row column data.
| The overhead of using this router type is high because the 'subselect' statement runs for each row that is routed. It should not be used for tables that have a lot of rows that are updated. It also has the disadvantage that if the data being relied on to determine the node id has been deleted before routing takes place, then no results would be returned and routing would not happen. |
The router expression you specify is appended to the following SQL statement in order to select the node ids:
select c.node_id
from sym_node c
where c.node_group_id=:NODE_GROUP_ID
and c.sync_enabled=1 and ...
As you can see, you have access to information about the node currently under consideration for routing through the 'c' alias, for example c.external_id . There are two node-related tokens you can use in your expression:
-
:NODE_GROUP_ID
-
:EXTERNAL_DATA
Column names representing data for the row in question are prefixed with a colon as well., for example: :EMPLOYEE_ID, or :OLD_EMPLOYEE_ID. Here, the OLD_ prefix indicates the value before the change in cases where the old data has been captured.
For an example, consider the case where an Order table and an OrderLineItem table need to be routed to a specific store. The Order table has a column named order_id and STORE_ID. A store node has an external_id that is equal to the STORE_ID on the Order table. OrderLineItem, however, only has a foreign key to its Order of order_id. To route OrderLineItems to the same nodes that the Order will be routed to, we need to reference the master Order record.
There are two possible ways to solve this in SymmetricDS.
-
Configure a 'subselect' router type (shown below).
-
Use an external select to capture the data via a trigger for use in a column match router, see External Select.
c.external_id in (select STORE_ID from order where order_id=:ORDER_ID)
| In this example that the parent row in Order must still exist at the moment of routing for the child rows (OrderLineItem) to route, since the select statement is run when routing is occurring, not when the change data is first captured. |
Beanshell Router
When more flexibility is needed in the logic to choose the nodes to route to, then the a scripted router may be used. The currently available scripting language is Bean Shell. Bean Shell is a Java-like scripting language. Documentation for the Bean Shell scripting language can be found at http://www.beanshell.org .
The router type for a Bean Shell scripted router is 'bsh'. The router expression is a valid Bean Shell script that:
nodes |
Collection of org.jumpmind.symmetric.model.Node objects the router would route to normally. |
nodeIds |
Collection of node ids that the router would route to normally. You can just return this if you want the bsh router to behave like the default router. |
targetNodes |
Collection of org.jumpmind.symmetric.model.Node objects to be populated and returned. |
engine |
The instance of org.jumpmind.symmetric.ISymmetricEngine which has access to SymmetricDS services. |
Any Data Column |
Data column values are bound to the script evaluation as Java object representations of the column data. The columns are bound using the uppercase names of the columns. Example if store_id is a column then STORE_ID is a variable name available in Bean Shell script. |
Any Old Values |
Old Data column values are bound to the script evaluation as Java object representations of the column data. The columns are bound using the uppercase representations that are prefixed with 'OLD_'. Example if store_id is a column then OLD_STORE_ID is a variable name available in Bean Shell script representing the old value for the store_id before the change. |
targetNodes |
Collection of org.jumpmind.symmetric.model.Node objects that will be routed to. |
true |
All nodes should be routed |
false |
No nodes should be routed |
| The last line of a bsh script is always the return value. |
targetNodes.add(STORE_ID + "-" + WORKSTATION_NUMBER)
The same could also be accomplished by simply returning the node id.
STORE_ID + "-" + WORKSTATION_NUMBER
FLAG != null && !FLAG.equals(OLD_FLAG)
| Here we make use of OLD_, which provides access to the old column value. |
for(org.jumpmind.symmetric.model.Node node : nodes) {
if (STATION != null && node.getExternalId().equals(STATION.trim())) {
targetNodes.add(node.getNodeId());
}
}
4.6. Channels
Once group links and routers are defined, configuration must be completed to specify which data (tables, file systems, etc.) should be synchronized over those links and routers. The next step in defining which specific data in the database is moved is to define logical groupings for that data. Channels define those logical groupings. As an example, a set of tables that hold customer data might be logically grouped together in a Customer channel. Sales, returns, tenders, etc. (transaction data) might be logically grouped into a transaction channel. A default channel is automatically created that all tables will fall into unless other channels are created and specified. The default channel is called 'default'.
Channels can be disabled, suspended, or scheduled as needed.
| Transactions will NOT be preserved across channels so its important to setup channels to contain all tables that participate in a given transaction. |
- Channel ID
-
Identifier used through the system to identify a given channel.
- Processing Order
-
Numeric value to determine the order in which a channel will be processed. Channels will be processed in ascending order.
- Batch Algorithm
-
Batching is the grouping of data, by channel, to be transferred and committed at the client together.
Default |
All changes that happen in a transaction are guaranteed to be batched together. Multiple transactions will be batched and committed together until there is no more data to be sent or the max_batch_size is reached. |
Transactional |
Batches will map directly to database transactions. If there are many small database transactions, then there will be many batches. The max_batch_size column has no effect. |
Nontransactional |
Multiple transactions will be batched and committed together until there is no more data to be sent or the max_batch_size is reached. The batch will be cut off at the max_batch_size regardless of whether it is in the middle of a transaction. |
- Max Batch Size
-
Specifies the maximum number of data events to process within a batch for this channel.
- Max Batch To Send
-
Specifies the maximum number of batches to send for a given channel during a 'synchronization' between two nodes. A 'synchronization' is equivalent to a push or a pull. For example, if there are 12 batches ready to be sent for a channel and max_batch_to_send is equal to 10, then only the first 10 batches will be sent even though 12 batches are ready.
- Max Data To Route
-
Specifies the maximum number of data rows to route for a channel at a time.
- Max KB/s
-
Specifies the maximum network transfer rate in kilobytes per second. Use zero to indicate unlimited. When throttling the channel, make sure the channel is on its own queue or within a queue of channels that are throttled at the same rate. This is currently only implemented when staging is enabled.
- Data Loader Types
-
Determines how data will be loaded into the target tables. These are used during an initial load or a reverse initial load. Data loaders do not always have to load into the target relational database. They can write to a file, a web service, or any other type of non-relational data source. Data loaders can also use other techniques to increase performance of data loads into the target relation database.
default |
Performs an insert first and if this fails will fall back to an update to load the data. |
ftp_localhost |
Sends the data in CSV format to a configured ftp location. These locations are setup in the TODO {SYM_HOME}/conf/ftp-extensions.xml |
bulk |
Assigns the appropriate bulk loader to this channel. Supported bulk loaders include: Microsoft SQL, PostgreSQL, MySQL and Amazon Redshift over S3. |
mongodb |
MongoDB data loader. |
| Tables that should be data loaded should be configured to use this channel. Many times, a reload channel will be set to bulk load to increase the performance of an initial load. |
- Group Link Direction
-
For a node group link that is reversible, the channel can specify either "push" or "pull" to override the default group link communication. If this field is empty, the default group link communication is used.
- Enabled
-
Indicates whether the channel is enabled or disabled. If a channel is disabled, data is still captured for changes that occur on the source system, but it will not be routed and sent to the target until the channel is re-enabled.
- Reload Channel
-
Indicates whether a channel is available for initial loads and reverse initial loads.
- File Sync Channel
-
Indicates whether a channel is available for file synchronization.
- Use Old Data To Route
-
Indicates if the old data will be included for routing. Routing can then use this data for processing. Defaults to true.
- Use Row Data To Route
-
Indicates if the current data will be included for routing. Routing can then use this data for processing. Defaults to true.
- Use Primary Key (PK) Data to Route
-
Indicates if the primary key data will be include for routing. For example maybe a store ID is needed to apply logic on before sending to the appropriate target nodes. Defaults to true.
- Tables Contain Big Lobs
-
Indicates whether the channel contains big lobs. Some databases have shortcuts that SymmetricDS can take advantage of if it knows that the lob columns in SYM_DATA aren’t going to contain large lobs. The definition of how large a 'big' lob is varies from database to database.
TODO
|
Channel Tips and Tricks
Increase performance by creating designated channels for tables that use LOB data types. For these channels be sure to check the "Table Contains Big Lobs" to increase performance.
|
4.7. Table Triggers
The next step in defining the synchronization scenario is to define which tables should be synchronized. The simplest way to do this is by using the Auto Create… button on the Configure → Table Triggers screen. The Auto Create functionality introspects the master node’s database and looks for available tables to synchronize.
Note that you will be configuring tables to synchronize for specific Routers. You will get a chance to select tables for each configured Router. Select the channel you desire from the dropdown, select the tables that should be synchronized using that channel, and then click the Next button. A summary will be presented at the end of the configuration. Click Finish to create the triggers.
The Auto Create functionality can be used repeatedly to select different table, router and channel combinations. If you are configuring a new set of tables triggers that should use an independent Trigger definition be sure to enter values for either the Trigger Prefix or the Trigger Suffix fields. These will be used to create a unique Trigger Id.
For each table selected in the Auto Create wizard, a corresponding Table Trigger configuration element will be added that can be maintained within the Table Triggers list screen. New Table Trigger configuration elements can also be created manually from this screen.
- Trigger Id
-
Unique identifier for a trigger.
- Source Catalog
-
Optional name for the catalog the configured table is in. If the name includes * then a wildcard match on the table name will be attempted. \ Wildcard names can include a list of names that are comma separated. The ! symbol may be used to indicate a NOT match condition.
- Source Schema
-
Optional name for the schema a configured table is in. If the name includes * then a wildcard match on the table name will be attempted. Wildcard names can include a list of names that are comma separated. The ! symbol may be used to indicate a NOT match condition.
- Source Table
-
The name of the source table that will have a trigger installed to watch for data changes. See Trigger Wildcards for using wildcards to specify multiple source tables.
- Channel
-
The channel_id of the channel that data changes will flow through.
- Sync On Insert
-
Determines if changes will be captured for inserts.
- Sync On Update
-
Determines if changes will be captured for updates.
- Sync On Delete
-
Determines if changes will be captured for deletes.
- Reload Channel Id
-
The channel_id of the channel that will be used for initial loads.
- Sync On Insert Condition
-
Specify a condition for the insert trigger firing using an expression specific to the database. On most platforms, it is added to an "IF" statement in the trigger text. On SQL-Server it is added to the "WHERE" clause of a query for inserted/deleted logical tables. See Sync Condition Example.
- Sync On Update Condition
-
Specify a condition for the update trigger firing using an expression specific to the database. On most platforms, it is added to an "IF" statement in the trigger text. On SQL-Server it is added to the "WHERE" clause of a query for inserted/deleted logical tables. See Sync Condition Example.
- Sync On Delete Condition
-
Specify a condition for the delete trigger firing using an expression specific to the database. On most platforms, it is added to an "IF" statement in the trigger text. On SQL-Server it is added to the "WHERE" clause of a query for inserted/deleted logical tables. See Sync Condition Example.
- Sync Condition Example
-
Sync Conditions can access both old values and new values of a field/column using "old_" and "new_" respectively. For example, if your column is id and your condition checks the value coming in to be 'test', your condition will be:
new_id = 'test'
- Custom Insert Trigger Text
-
Specify insert trigger text (SQL) to execute after the SymmetricDS trigger fires. This field is not applicable for H2, HSQLDB 1.x or Apache Derby.
- Custom Update Trigger Text
-
Specify update trigger text (SQL) to execute after the SymmetricDS trigger fires. This field is not applicable for H2, HSQLDB 1.x or Apache Derby.
- Custom Delete Trigger Text
-
Specify delete trigger text (SQL) to execute after the SymmetricDS trigger fires. This field is not applicable for H2, HSQLDB 1.x or Apache Derby.
- Sync On Incoming
-
Whether or not an incoming batch that loads data into this table should cause the triggers to capture data_events. Be careful turning this on, because an update loop is possible.
- Stream Lobs
-
Specifies whether to capture lob data as the trigger is firing or to stream lob columns from the source tables using callbacks during extraction. A value of 1 indicates to stream from the source via callback; a value of 0, lob data is captured by the trigger.
- Capture Lobs
-
Provides a hint as to whether this trigger will capture big lobs data. If set to 1 every effort will be made during data capture in trigger and during data selection for initial load to use lob facilities to extract and store data in the database. On Oracle, this may need to be set to 1 to get around 4k concatenation errors during data capture and during initial load.
- Capture Old Data
-
Indicates whether this trigger should capture and send the old data (previous state of the row before the change).
- Stream Row
-
Captures only the primary key when the trigger fires which can reduce the overhead of the trigger on tables with lots of columns. The data will then be queried using the PK values captured when the batch is ready for extraction.
- External Select
-
Specify a SQL select statement that returns a single row, single column result. It will be used in the generated database trigger to populate the EXTERNAL_DATA field on the data table. See
- Excluded Column Names
-
Specify a comma-delimited list of columns that should not be synchronized from this table.
- Included Column Names
-
Specify a comma-delimited list of columns only should be synchronized from this table.
- Sync Key Names
-
Specify a comma-delimited list of columns that should be used as the key for synchronization operations. By default, if not specified, then the primary key of the table will be used.
- Channel Expression
-
An expression that will be used to capture the channel id in the trigger. This expression will only be used if the channel_id is set to 'dynamic.'
|
Multiple Triggers On A Table
Note that many databases allow for multiple triggers of the same type to be defined. Each database defines the order in which the triggers fire differently.
If you have additional triggers beyond those SymmetricDS installs on your table, please consult your database documentation to determine if
there will be issues with the ordering of the triggers.
|
|
Quick Config Wizard
Use the Quick Config Wizard to setup a complete synchronization scenario with default trigger settings.
|
|
Auto Create Triggers
By using the "Auto Create" button the on the "Table Triggers" screen you can create all your triggers through a single dialog.
|
|
Capture Changed Data
When determining whether a data change has occurred or not, by default the triggers will record a change even if the data was updated to the same value(s) they were originally. For example, a data change will be captured if an update of one column in a row updated the value to the same value it already was. There is a global property that allows you to override this behavior (defaults to false). trigger.update.capture.changed.data.only.enabled=false This property is currently only supported on MySQL, DB2, SQL Server, and Oracle. |
4.7.1. Trigger Wildcards
The source table name may contain the asterisk ('*') wildcard character so that one trigger entry can define synchronization for many tables.
-
If multiple wildcard tokens are are supplied they should be deliminated with a comma.
-
Tokens are always evaluated from left to right.
-
When a table match is made, the table is either added to or removed from the list of tables. If another trigger already exists for a table, then that table is not included in the wildcard match (the explicitly defined trigger entry take precendence).
-
System tables and any tables that start with the SymmetricDS table prefix will be excluded.
-
A wildcard token can also start with an exclamation ('!') to indicate an exclusive match.
4.7.2. External Select
Occasionally, you may find that you need to capture and save away a piece of data present in another table when a trigger is firing. This data is typically needed for the purposes of determining where to 'route' the data to once routing takes place. Each trigger definition contains an optional "external select" field which can be used to specify the data to be captured. Once captured, this data is available during routing in DATA 's external_data field.
For these cases, place a SQL select statement which returns the data item you need for routing in external_select.
| The external select SQL must return a single row, single column |
select STORE_ID from order where order_id=$(curTriggerValue).$(curColumnPrefix)order_id
$(curTriggerValue) |
Variable to be replaced with the NEW or OLD column alias provided by the trigger context, which is platform specific. For insert and update triggers, the NEW alias is used; for delete triggers, the OLD alias is used. For example, "$(curTriggerValue).COLUMN" becomes ":new.COLUMN" for an insert trigger on Oracle. |
$(curColumnPrefix) |
Variable to be replaced with the NEW_ or OLD_ column prefix for platforms that don’t support column aliases. This is currently only used by the H2 database. All other platforms will replace the variable with an empty string. For example "$(curColumnPrefix)COLUMN" becomes "NEW_COLUMN" on H2 and "COLUMN" on Oracle. |
| External select SQL statements should be used carefully as they will cause the trigger to run the additional SQL each time the trigger fires. |
| Using an external select on the trigger is similar to using the 'subselect' router. The advantage of this approach over the 'subselect' approach is that it guards against the (somewhat unlikely) possibility that the master Order table row might have been deleted before routing has taken place. This external select solution also is a bit more efficient than the 'subselect' approach. |
4.7.3. Load Only Triggers
Occasionally the decision of what data to load initially results in additional triggers. These triggers, known as load only triggers, are configured such that they do not capture any data changes. In other words, the sync on insert, sync on update, and sync on delete attributes of the trigger are all set to false.
-
Initial Loads
-
Reverse Initial Loads
-
Table Reloads
-
Creation of tables during initial loads
-
To load a read-only lookup table, for example. It could also be used to load a table that needs populated with example or default data.
-
Recovery of data for tables that have a single direction of synchronization. For example, a retail store records sales transactions that synchronize in one direction by trickling back to the central office. If the retail store needs to recover all the sales transactions from the central office, they can be sent are part of an initial load from the central office by setting up a load only trigger that "sync" in that direction.
4.8. Table Routing
As discussed previously, Routers define more specifically which captured data from a source node should be sent to which specific nodes in a target node group. Table Triggers define which tables should be synchronized. Table Routing defines how the Routers and Table Triggers work together, specifying what tables are associated with each router (and thus the direction the table data flows).
The simplest way to set up Table Routing is to use the Auto Create… button.
The default entry point for the auto creation is from the triggers perspective. This will allow you to select table triggers first then routers.
Clicking Next to select from available routers.
Select the routers (direction) to associate with the previously selected triggers.
Click Save to install the triggers on the database and begin data capture.
You can also select routers on the first screen of the auto create if you would like to select routers first then triggers. Both approaches will result in the same table routing.
Select from available routers.
Clicking Next to select from available table triggers.
Select the table triggers to associate with the previously selected routers.
Click Save to install the triggers on the database and begin data capture.
The table routing elements created by the wizard can be maintained from the Table Routing List Screen. New table routing entries can also be created manually from this screen if desired.
| SymmetricDS will not install table triggers on the database until they are associated with a router through table routing. |
| Once a table routing relationship is created data capture will begin. |
- Table Trigger
-
The table trigger to link.
- Router
-
The router to link.
- Initial Load Select
-
SQL used as part of the WHERE clause on a SQL statement during the initial load process to extract data from the source node of the router. If blank all rows will be selected. If you want no rows to load during initial load you can set the expression to 1=0 or set Initial Load Order to a negative number.
- Initial Load Delete
-
SQL used as part of the WHERE clause on a SQL statement during the initial load process to delete data on the target node of the router.
Initial Load Delete SQL will only be used if the following parameter is true (default is false).
initial.load.delete.first=true
- Initial Load Order
-
Order sequence of this table when an initial load is sent to a node. If this value is the same for multiple tables, then SymmetricDS will attempt to order the tables according to FK constraints. If this value is set to a negative number, then the table will be excluded from an initial load.
- Initial Load Batch Count
-
Only applicable if the initial load extract job is enabled. The number of batches to split an initial load of a table across. If 0 then a select count(*) will be used to dynamically determine the number of batches based on the max_batch_size of the reload channel.
- Enabled
-
Each individual trigger-router combination can be disabled or enabled if needed. By default, a trigger router is enabled, but if you have a reason you wish to define a trigger router combination prior to it being active, you can uncheck the enabled box. This will cause the trigger-router mapping to be sent to all nodes, but the trigger-router mapping will not be considered active or enabled for the purposes of capturing data changes or routing.
- Ping Back Enabled
-
SymmetricDS, by default, avoids circular data changes. When a trigger fires as a result of SymmetricDS itself (such as the case when sync on incoming batch is set), it records the originating source node of the data change in source_node_id. During routing, if routing results in sending the data back to the originating source node, the data is not routed by default. If instead you wish to route the data back to the originating node, you can check the ping back enabled column. This will cause the router to "ping" the data back to the originating node when it usually would not.
4.9. File Triggers
In addition to supporting database synchronization, SymmetricDS also supports File Synchronization. Similar to database synchronization which allows configuring [Table Triggers], SymmetricDS also supports setting up File Triggers. A file trigger is equivalent to specifying a directory structure or path that should be "watched" for files that need to be synchronized.
- File Trigger Id
-
Unique identifier for a trigger.
- Channel
-
The channel_id of the channel that data changes will flow through.
- Reload Channel Id
-
The channel_id of the channel that will be used for reloads.
- Base Directory
-
The base directory on the source node that files will be synchronized from.
- Recurse
-
Whether to synchronize child directories.
- Include Files
-
Wildcard-enabled (*), comma-separated list of file to include in synchronization.
- Exclude Files
-
Wildcard-enabled (*), comma-separated list of file to exclude from synchronization.
- Sync On Create
-
Whether to capture and send files when they are created.
- Sync On Modified
-
Whether to capture and send files when they are modified.
- Sync On Delete
-
Whether to capture and send files when they are deleted.
- Sync On Ctl File
-
Combined with sync_on_create, determines whether to capture and send files when a matching control file exists. The control file is a file of the same name with a '.ctl' extension appended to the end.
- Delete After Sync
-
Determines whether to delete the file after it has synced successfully.
- Before Copy Script
-
A beanshell script that is run at the target node right before the file copy to it’s destination directory.
- After Copy Script
-
A beanshell script that is run at the target node right after the file copy to it’s destination directory.
4.9.1. BeanShell Scripts
There are two types of Bean Shell scripts that can be leveraged to customize file synchronization behavior:
- Before copy script
-
This runs on delivery of a file before it is copied to it’s target location
- After copy script
-
This run on delivery of a file after it is copied to it’s target location
Each of these scripts have access to local variables that can be read or set to affect the behavior of copying files.
- targetBaseDir
-
The preset base directory as configured in file trigger or overwritten in file trigger router. This variable can be set by the before copy script to set a different target directory.
- targetFileName
-
The name of the file that is being synchronized. This variable can be overwritten by the before copy script to rename a file at the target.
- targetRelativeDir
-
The name of a directory relative to the target base directory to which the target file will be copied. The default value of this variable is the relative directory of the source. For example, if the source base directory is /src and the target base directory is /tgt and the file /src/subfolder/1.txt is changed, then the default targetRelativeDir will be subfolder. This variable can be overwritten by the before_copy_script to change the relative directory at the target. In the above example, if the variable is set to blank using the following script, then the target file will be copied to /tgt/1.txt.
targetRelativeDir = "";
- processFile
-
This is a variable that is set to true by default. A custom before copy script may process the file itself and set this variable to false to indicate that the file should NOT be copied to its target location.
- sourceFileName
-
This is the name of the file.
- sourceFilePath
-
This is the path where the file can be found relative to the batch directory.
- batchDir
-
This is the staging directory where the batch has been extracted. The batchDir + sourceFilePath + sourceFileName can be used to locate the extracted file.
- engine
-
This is the bound instance of the ISymmetricEngine that is processing a file. It gives access to all of the APIs available in SymmetricDS.
- sourceNodeId
-
This is a bound variable that represents the nodeId that is the source of the file.
- log
-
This is the bound instance of an org.slf4j.Logger that can be used to log to the SymmetricDS log file.
File file = new File(batchDir + "/" + sourceFilePath + "/" + sourceFileName);
if (file.exists()) {
String path = file.getAbsolutePath();
cp (path,"/backup/" + sourceFileName);
}
4.10. File Routing
The file routing concept is identical to the Table Routing concept except that it is for files versus database tables. The File Triggers specify which directory structures on the source node to watch, and the Routers define where the data is sent based on the configured router. Similar to Table Routing the simplest way to set up file routing is via the use of the Auto Create button.
The default entry point for the auto creation is from the file triggers perspective. This will allow you to select file triggers first then routers.
Click Next to select from available routers.
Select the routers (direction) to associate with the previously selected file triggers.
Click Save to setup the file triggers and begin file sync.
You can also select routers on the first screen of the auto create if you would like to select routers first then file triggers. Both approaches will result in the same file routing.
Select from available routers.
Click Next to select from available file triggers.
Select the file triggers to associate with the previously selected routers.
Click Save to setup the file routers and begin file sync.
The file routing elements created by the wizard can be maintained from the File Routing List Screen. New file routing entries can also be created manually from this screen if desired.
| SymmetricDS will not create file triggers until they are associated with a router through file routing. |
| Once a file routing relationship is created changes on the base directory will be captured and file sync will begin. |
- File Triggers
-
The file trigger to link.
- Routers
-
The router to link.
- Target Base Directory
-
The base directory on the target node that files will be synchronized to.
- Conflict Strategy
-
The strategy to employ when a file has been modified at both the client and the server.
source_wins |
The source file will be used when a conflict occurs. |
target_wins |
The target file will be used when a conflict occurs. |
manual |
If a conflict occurs the batch will be put in ER (error) status and require manual intervention to resolve the issue. |
- Initial Enabled
-
Indicates whether this file trigger should be initial loaded.
- Enabled
-
Indicates whether this file trigger router is enabled or not.
4.11. Conflicts
Conflict detection is the act of determining if an insert, update or delete is in "conflict" due to the target data row not being consistent with the data at the source prior to the insert/update/delete.
-
Detection
-
Resolution
-
Ping Back
- Conflict Id
-
Unique identifier for a specific conflict detection setting.
- Group Link
-
References a node group link.
- Detection Type
-
Indicates the strategy to use for detecting conflicts during a dml action.
| Conflicts are detected while data is being loaded into a target system. |
USE_PK_DATA |
Indicates that only the primary key is used to detect a conflict. If a row exists with the same primary key, then no conflict is detected during an update or a delete. Updates and deletes rows are resolved using only the primary key columns. If a row already exists during an insert then a conflict has been detected. |
USE_CHANGED_DATA |
Indicates that the primary key plus any data that has changed on the source system will be used to detect a conflict. If a row exists with the same old values on the target system as they were on the source system for the columns that have changed on the source system, then no conflict is detected during an update or a delete. If a row already exists during an insert then a conflict has been detected. |
USE_OLD_DATA |
Indicates that all of the old data values are used to detect a conflict. Old data is the data values of the row on the source system prior to the change. If a row exists with the same old values on the target system as they were on the source system, then no conflict is detected during an update or a delete. If a row already exists during an insert then a conflict has been detected. |
USE_TIMESTAMP |
Indicates that the primary key plus a timestamp column (as configured in detect_expression ) will indicate whether a conflict has occurred. If the target timestamp column is not equal to the old source timestamp column, then a conflict has been detected. If a row already exists during an insert then a conflict has been detected. |
USE_VERSION |
Indicates that the primary key plus a version column (as configured in detect_expression ) will indicate whether a conflict has occurred. If the target version column is not equal to the old source version column, then a conflict has been detected. If a row already exists during an insert then a conflict has been detected. |
- Resolution Type
-
The choice of how to resolve a detected conflict is configured via the resolve type. Depending on the setting, two additional boolean settings may also be configured, namely "resolve row only" and "resolve changes only".
FALLBACK |
Indicates that when a conflict is detected the system should automatically apply the changes anyways. If the source operation was an insert, then an update will be attempted. If the source operation was an update and the row does not exist, then an insert will be attempted. If the source operation was a delete and the row does not exist, then the delete will be ignored. The resolve_changes_only flag controls whether all columns will be updated or only columns that have changed will be updated during a fallback operation. |
IGNORE |
Indicates that when a conflict is detected the system should automatically ignore the incoming change. The resolve_row_only column controls whether the entire batch should be ignore or just the row in conflict. |
MANUAL |
Indicates that when a conflict is detected the batch will remain in error until manual intervention occurs. A row in error is inserted into the INCOMING_ERROR table. The conflict detection id that detected the conflict is recorded (i.e., the conflict_id value from CONFLICT), along with the old data, new data, and the "current data" (by current data, we mean the unexpected data at the target which doesn’t match the old data as expected) in columns old_data, new_data, and cur_data. In order to resolve, the resolve_data column can be manually filled out which will be used on the next load attempt instead of the original source data. The resolve_ignore flag can also be used to indicate that the row should be ignored on the next load attempt. |
NEWER_WINS |
Indicates that when a conflict is detected by USE_TIMESTAMP or USE_VERSION that the either the source or the target will win based on the which side has the newer timestamp or higher version number. The resolve_row_only column controls whether the entire batch should be ignore or just the row in conflict. |
- Ping Back
-
For each configured conflict, you also have the ability to control if and how much "resolved" data is sent back to the node who’s data change is in conflict. This "ping back" behavior is specified by the following options.
REMAINING_ROWS |
The resolved data of the single row in the batch in conflict, along with the entire remainder of the batch, is sent back to the originating node. |
SINGLE_ROW |
The resolved data of the single row in the batch that caused the conflict is sent back to the originating node. |
OFF |
No data is sent back to the originating node, even if the resolved data doesn’t match the data the node sent. |
- Detection Expression
-
An expression that provides additional information about the detection mechanism. If the detection mechanism is use_timestamp or use_version then this expression will be the name of the timestamp or version column.
| The detect_expression can be used to exclude certain column names from being used. In order to exclude column1 and column2, the expression would be: |
excluded_column_names=column1,column2
- Resolve Changes Only
-
Indicates that when applying changes during an update that only data that has changed should be applied. Otherwise, all the columns will be updated. This really only applies to updates.
- Resolve Row Only
-
When 'resolve row only' is set to true, the system will ignore only the rows in conflict. When 'resolve row only' is set to false, the system will ignore the entire batch. This applies to a resolve type of 'ignore'.
- Channel
-
Optional channel that this setting will be applied to.
- Target Catalog
-
Optional database catalog that the target table belongs to. Only use this if the target table is not in the default catalog.
- Target Schema
-
Optional database schema that the target table belongs to. Only use this if the target table is not in the default schema.
- Target Table
-
Optional database table that this setting will apply to. If left blank, the setting will be for any table in the channel (if set) and in the specified node group link.
| Be aware that conflict detection will not detect changes to binary columns in the case where use_stream_lobs is true in the trigger for the table. In addition, some databases do not allow comparisons of binary columns whether use_stream_lobs is true or not. |
| Some platforms do not support comparisons of binary columns. Conflicts in binary column values will not be detected on the following platforms: DB2, DERBY, ORACLE, and SQLSERVER. |
4.12. Transforms
Transforms allow you to manipulate data on a source node or on a target node, as the data is being loaded or extracted.
| The transform source table must be configured for synchronization through a linked trigger (see Table Triggers). |
The source trigger creates the synchronization data, while the transformation configuration decides what to do with the synchronization data as it is either being extracted from the source or loaded into the target. You have the flexibility of defining different transformation behavior depending on whether the source change that triggered the synchronization was an Insert, Update, or Delete. In the case of Delete, you even have options on what exactly to do on the target side, be it a delete of a row, setting columns to specific values, or absolutely nothing at all.