Table of Contents
Groups are defined in the NODE_GROUP table. The following SQL statements would create node groups for "corp" and "store" based on our retail store example.
insert into SYM_NODE_GROUP (node_group_id, description) values ('store', 'A retail store node'); insert into SYM_NODE_GROUP (node_group_id, description) values ('corp', 'A corporate node');
Group links are defined in the NODE_GROUP_LINK table. Links define how a node that belongs to a group will communicate with nodes in other groups. The following are the communication mechanisms that can be configured.
Indicates that the source node will initiate communication over an HTTP PUT.
Indicates that the source node will wait for a target node to connect via an HTTP GET to pull data.
Route-only indicates that the data isn't going to be transported via SymmetricDS. This action type might be useful when using an XML publishing router or an audit table changes router.
The link also defines if configuration data will be synchronized on the link. For example, you might not want remote nodes to be able to change
configuration and effect other nodes in the network. In this case you would set sync_config_enabled
to 0 on the appropriate link.
A link can be configured to use the same node group as the source and the target. This configuration allows a node group to sync with every other node in its group.
The following SQL statements links the "corp" and "store" node groups for synchronization. It configures the "store" nodes to push their data changes to the "corp" nodes, and the "corp" nodes to send changes to "store" nodes by waiting for a pull.
insert into SYM_NODE_GROUP_LINK (source_node_group, target_node_group, data_event_action) values ('store', 'corp', 'P'); insert into SYM_NODE_GROUP_LINK (source_node_group, target_node_group, data_event_action) values ('corp', 'store', 'W');
By categorizing data into channels and assigning them to TRIGGER s, the user gains more control and visibility into the flow of data. In addition, SymmetricDS allows for synchronization to be enabled, suspended, or scheduled by channels as well. The frequency of synchronization and order that data gets synchronized is also controlled at the channel level.
The following SQL statements setup channels for a retail store. An "item" channel includes data for items and their prices, while a "sale_transaction" channel includes data for ringing sales at a register.
insert into SYM_CHANNEL (channel_id, rocessing_order, max_batch_size, max_batch_to_send, extract_period_millis, batch_algorithm, enabled, description) values ('item', 10, 1000, 10, 0, 'default', 1, 'Item and pricing data'); insert into SYM_CHANNEL (channel_id, processing_order, max_batch_size, max_batch_to_send, extract_period_millis, batch_algorithm, enabled, description) values ('sale_transaction', 1, 1000, 10, 60000, 'transactional', 1, 'retail sale transactions from register');
Batching is the grouping of data, by channel, to be transferred and committed at the client together. There are three different out-of-the-box batching algorithms which may be configured in the batch_algorithm column on channel.
All changes that happen in a transaction are guaranteed to be batched together. Multiple transactions will be batched and committed together until there is no more data to be sent or the max_batch_size is reached.
Batches will map directly to database transactions. If there are many small database transactions, then there will be many batches. The max_batch_size column has no effect.
Multiple transactions will be batched and committed together until there is no more data to be sent or the max_batch_size is reached. The batch will be cut off at the max_batch_size regardless of whether it is in the middle of a transaction.
If a channel contains
only
tables that will be synchronized in one direction and and data is routed
to all the nodes in the target node groups, then batching on the channel
can be optimized to share batches across nodes. This is an important
feature when data needs to be routed to thousands of nodes. When this
mode is detected, you will see batches created in
OUTGOING_BATCH
with the
common_flag
set to 1.
There are also several size-related parameters that can be set by channel. They include:
Specifies the maximum number of data events to process within a batch for this channel.
Specifies the maximum number of batches to send for a given channel during a 'synchronization' between two nodes. A 'synchronization' is equivalent to a push or a pull. For example, if there are 12 batches ready to be sent for a channel and max_batch_to_send is equal to 10, then only the first 10 batches will be sent even though 12 batches are ready.
Specifies the maximum number of data rows to route for a channel at a time.
Based on your particular synchronization requirements, you can also specify whether old, new, and primary key data should be read and included during routing for a given channel. These are controlled by the columns use_old_data_to_route, use_row_data_to_route, and use_pk_data_to_route, respectively. By default, they are all 1 (true).
If data on a particular channel contains big lobs, you can set the column contains_big_lob to 1 (true) to provide SymmetricDS the hint that the channel contains big lobs. Some databases have shortcuts that SymmetricDS can take advantage of if it knows that the lob columns in DATA aren't going to contain large lobs. The definition of how large a 'big' lob is varies from database to database.
SymmetricDS captures synchronization data using database triggers.
SymmetricDS' Triggers are defined in the
TRIGGER
table. Each record is used by SymmetricDS when generating database
triggers. Database triggers are only generated when a trigger is
associated with a
ROUTER
whose
source_node_group_id
matches the node group id of the current node.
The
source_table_name
may contain the asterisk ('*') wildcard character so that one
TRIGGER
table entry can define synchronization for many tables. System tables
and any tables that start with the SymmetricDS table prefix will be
excluded. A list of wildcard tokens can also be supplied. If there are
multiple tokens, they should be delimited with a comma. A wildcard token
can also start with a bang ('!') to indicate an exclusive match. Tokens
are always evalulated from left to right. When a table match is made,
the table is either added to or removed from the list of tables. If
another trigger already exists for a table, then that table is not
included in the wildcard match (the explictly defined trigger entry take
precendence).
When determining whether a data change has occurred or not, by defalt
the triggers will record a change even if the data was updated to the
same value(s) they were originally. For example, a data change will be
captured if an update of one column in a row updated the value to the
same value it already was. There is a global property,
trigger.update.capture.changed.data.only.enabled
(false by default), that allows you to override this behavior. When set
to true, SymmetricDS will only capture a change if the data has truly
changed (i.e., when the new column data is not equal to the old column
data).
trigger.update.capture.changed.data.only.enabled
is currently only supported in the MySQL, DB2, SQL Server and Oracle dialects.
The following SQL statement defines a trigger that will capture data for a table named "item" whenever data is inserted, updated, or deleted. The trigger is assigned to a channel also called 'item'.
insert into SYM_TRIGGER (trigger_id, source_table_name, channel_id, last_update_time, create_time) values ('item', 'item', 'item', current_timestamp, current_timestamp);
Note that many databases allow for multiple triggers of the same type to be defined. Each database defines the order in which the triggers fire differently. If you have additional triggers beyond those SymmetricDS installs on your table, please consult your database documentation to determine if there will be issues with the ordering of the triggers.
The TRIGGER_ROUTER table is used to define which specific combinations of triggers and routers are needed for your configuration. The relationship between triggers and routers is many-to-many, so this table serves as the join table to define which combinations are valid, as well as to define settings available at the trigger-router level of granularity.
Three important controls can be configured for a specific Trigger / Router combination: Enabled, Initial Loads and Ping Back. The parameters for these can be found in the Trigger / Router mapping table, TRIGGER_ROUTER .
Each individual trigger-router combination can be disabled or enabled if
needed. By default, a trigger router is enabled, but if you have a
reason you wish to define a trigger router combination prior to it being
active, you can set the
enabled
flag to 0. This will cause the trigger-router mapping to be sent to all
nodes, but the trigger-router mapping will not be considered active or
enabled for the purposes of capturing data changes or routing.
SymmetricDS, by default, avoids circular data changes. When a trigger
fires as a result of SymmetricDS itself (such as the case when sync on
incoming batch is set), it records the originating source node of the
data change in
source_node_id
. During routing, if routing results in sending the data back to the
originating source node, the data is not routed by default. If instead
you wish to route the data back to the originating node, you can set the
ping_back_enabled
column for the needed particular trigger / router combination. This will
cause the router to "ping" the data back to the originating node when it
usually would not.
Two lobs-related settings are also available on TRIGGER :
Specifies whether to capture lob data as the trigger is firing or to stream lob columns from the source tables using callbacks during extraction. A value of 1 indicates to stream from the source via callback; a value of 0, lob data is captured by the trigger.
Provides a hint as to whether this trigger will capture big lobs data. If set to 1 every effort will be made during data capture in trigger and during data selection for initial load to use lob facilities to extract and store data in the database.
Occasionally, you may find that you need to capture and save away a
piece of data present in another table when a trigger is firing. This
data is typically needed for the purposes of determining where to
'route' the data to once routing takes place. Each trigger definition
contains an optional
external_select
field which can be used to specify the data to be captured. Once
captured, this data is available during routing in
DATA
's
external_data
field. For these cases, place a SQL select statement which returns the
data item you need for routing in
external_select
. An example of the use of external select can be found in
Section 3.6.7, “Utilizing External Select when Routing”
.
Occasionally the decision of what data to load initially results in
additional triggers. These triggers, known as
Dead Triggers
, are configured such that they do not capture any data changes. A
"dead" Trigger is one that does not capture data changes. In other
words, the
sync_on_insert
,
sync_on_update
, and
sync_on_delete
properties for the Trigger are all set to false. However, since the
Trigger is specified, it
will
be included in the initial load of data for target Nodes.
Why might you need a Dead Trigger? A dead Trigger might be used to load a read-only lookup table, for example. It could also be used to load a table that needs populated with example or default data. Another use is a recovery load of data for tables that have a single direction of synchronization. For example, a retail store records sales transactions that synchronize in one direction by trickling back to the central office. If the retail store needs to recover all the sales transactions from the central office, they can be sent are part of an initial load from the central office by setting up dead Triggers that "sync" in that direction.
The following SQL statement sets up a non-syncing dead Trigger that
sends the
sale_transaction
table to the "store" Node Group from the "corp" Node Group during an
initial load.
insert into sym_trigger (TRIGGER_ID,SOURCE_CATALOG_NAME, SOURCE_SCHEMA_NAME,SOURCE_TABLE_NAME,CHANNEL_ID, SYNC_ON_UPDATE,SYNC_ON_INSERT,SYNC_ON_DELETE, SYNC_ON_INCOMING_BATCH,NAME_FOR_UPDATE_TRIGGER, NAME_FOR_INSERT_TRIGGER,NAME_FOR_DELETE_TRIGGER, SYNC_ON_UPDATE_CONDITION,SYNC_ON_INSERT_CONDITION, SYNC_ON_DELETE_CONDITION,EXTERNAL_SELECT, TX_ID_EXPRESSION,EXCLUDED_COLUMN_NAMES, CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) values ('SALE_TRANSACTION_DEAD',null,null, 'SALE_TRANSACTION','transaction', 0,0,0,0,null,null,null,null,null,null,null,null,null, current_timestamp,'demo',current_timestamp); insert into sym_router (ROUTER_ID,TARGET_CATALOG_NAME,TARGET_SCHEMA_NAME, TARGET_TABLE_NAME,SOURCE_NODE_GROUP_ID,TARGET_NODE_GROUP_ID,ROUTER_TYPE, ROUTER_EXPRESSION,SYNC_ON_UPDATE,SYNC_ON_INSERT,SYNC_ON_DELETE, CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) values ('CORP_2_STORE',null,null,null, 'corp','store',null,null,1,1,1, current_timestamp,'demo',current_timestamp); insert into sym_trigger_router (TRIGGER_ID,ROUTER_ID,INITIAL_LOAD_ORDER, INITIAL_LOAD_SELECT,CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) values ('SALE_TRANSACTION_DEAD','CORP_2_REGION',100,null, current_timestamp,'demo',current_timestamp);
A trigger row may be updated using SQL to change a synchronization definition.
SymmetricDS will look for changes each night or whenever the Sync Triggers Job
is run (see below). For example, a change to place the table price_changes
into the price channel would be accomplished with the following statement:
update SYM_TRIGGER set channel_id = 'price', last_update_by = 'jsmith', last_update_time = current_timestamp where source_table_name = 'price_changes';
All configuration changes should be managed centrally at the registration node. If enabled, configuration changes will be synchronized out to client nodes. When trigger changes reach the client nodes the Sync Triggers Job will run automatically.
Centrally, the trigger changes will not take effect until the Sync Triggers Job runs. Instead of waiting for the Sync Triggers Job to run overnight after making a Trigger change, you can invoke the syncTriggers() method over JMX or simply restart the SymmetricDS server. A complete record of trigger changes is kept in the table TRIGGER_HIST, which was discussed in Section 4.3.5, “Sync Triggers Job”.
SymmetricDS not only supports the synchronization of database tables, but it also supports the synchronization of files and folders from one node to another.
File synchronization features include:
Like database synchronization, file synchronization is configured in a series of database tables. The configuration was designed to be similar to database synchronization in order to maintain consistency and to give database synchronization users a sense of familiarity.
For database synchronization, SymmetricDS uses TRIGGER to configure which tables will capture data for synchronization and ROUTER to designate which nodes will be the source of data changes and which nodes will receive the data changes. TRIGGER_ROUTER links triggers to routers.
Likewise, for file synchronization, SymmetricDS uses FILE_TRIGGER to designate which base directories will be monitored. Each entry in FILE_TRIGGER designates one base directory to monitor for changes on the source system. The columns on FILE_TRIGGER provide additional settings for choosing specific files in the base directory that will be monitored, and whether to recurse into subdirectories, etc. File triggers are linked to routers by FILE_TRIGGER_ROUTER. The file trigger router not only links the source and the target node groups, but it also optionally provides the ability to override the base directory name at the target. FILE_TRIGGER_ROUTER also provides a flag that indicates if the target node should be seeded with the files from the source node during SymmetricDS's initial load process.
Not only is file synchronization configured similar to database synchronization, but it also operates in a very similar way. The file system is monitored for changes via a
background job that tracks the file system changes
(this parallels the use of triggers to monitor for changes when synchronizing database changes).
When a change is detected it is written to the FILE_SNAPSHOT
table. The file snapshot table represents the most recent known state of the
monitored files. The file snapshot table has a SymmetricDS database trigger automatically installed
on it so that when it is updated the changes are captured by SymmetricDS on an internal
channel named filesync
.
The changes to FILE_SNAPSHOT are then routed and batched by a file-synchronization-specific router that delegates to the configured router based on the FILE_TRIGGER_ROUTER configuration. The file sync router can make routing decisions based on the column data of the snapshot table, columns which contain attributes of the file like the name, path, size, and last modified time. Both old and new file snapshot data are also available. The router can, for example, parse the path or name of the file and use it as the node id to route to.
Batches of file snapshot changes are stored on the
filesync
channel in OUTGOING_BATCH. The existing SymmetricDS pull and
push jobs ignore the filesync
channel. Instead, they are processed by
file-synchronization-specific push and pull jobs.
When transferring data, the file sync push and pull jobs build a zip
file dynamically based on the batched snapshot data. The
zip file contains a directory per batch. The directory name is the
batch_id
. A sync.bsh
Bean Shell
script is generated and placed in the root of each batch directory. The Bean Shell script contains the commands to copy
or delete files at their file destination from an extracted zip in the staging directory on the
target node. The zip file is downloaded in the
case of a pull, or, in the case of a push, is uploaded as an HTTP multi-part attachment.
Outgoing zip files are written and transferred from the
outgoing staging directory. Incoming zip files are staged in the
filesync_incoming
staging directory by source node id. The
filesync_incoming/{node_id}
staging directory is cleared out before each
subsequent delivery of files.
The acknowledgement of a batch happens the same way it is acknowledged in database synchronization. The client responds with an acknowledgement as part of the response during a file push or pull.
There are two types of Bean Shell scripts that can be
leveraged to customize file synchronization behavior: before_copy_script
and after_copy_script
.
Each of these scripts have access to local variables that can be read or set to affect the behavior of copying files.
The preset base directory as configured in FILE_TRIGGER or
overwritten in FILE_TRIGGER_ROUTER. This variable can be set by the
before_copy_script
to set a different target directory.
The name of the file that is being synchronized. This variable can be overwritten by the
before_copy_script
to rename a file at the target.
The name of a directory relative to the target base directory to which the target file will be copied. The
default value of this variable is the relative directory of the source. For example, if the source base directory is
/src
and the target base directory is /tgt
and the file /src/subfolder/1.txt
is changed, then the default targetRelativeDir will be subfolder
.
This variable can be overwritten by the
before_copy_script
to change the relative directory at the target. In the above example, if the variable is
set to blank using the following script, then the target file will be copied to /tgt/1.txt
.
targetRelativeDir = "";
This is a variable that is set to true by default. A custom
before_copy_script
may process the file itself and set this variable to
false to indicate that the file should NOT be copied to its target
location.
This is the name of the file.
This is the path where the file can be found relative to the batch directory.
This is the staging directory where the batch has been extracted. The batchDir + sourceFilePath + sourceFileName can be used to locate the extracted file.
This is the bound instance of the ISymmetricEngine that is processing a file. It gives access to all of the APIs available in SymmetricDS.
This is a bound variable that represents the nodeId that is the source of the file.
This is the bound instance of an org.slf4j.Logger
that can
be used to log to the SymmetricDS log file.
The following example is for a configuration with client and server node
groups. Creation, modification, and deletion of files with the extension
of txt
will be captured recursively
in the /filesync/server/all
directory. A before copy script will set the targetBaseDir to
/filesync/clients/{externalId}
.
INSERT INTO sym_file_trigger (trigger_id,base_dir,recurse,includes_files,excludes_files,sync_on_create, sync_on_modified,sync_on_delete,sync_on_ctl_file,delete_after_sync,before_copy_script,after_copy_script, create_time,last_update_by,last_update_time) VALUES ('sync_directory','/filesync/server/all',1,'*.txt',null,1,1,1,0,0, 'targetBaseDir = "/filesync/clients/" + engine.getParameterService().getExternalId();',null,current_timestamp,'example', current_timestamp); INSERT INTO sym_file_trigger_router (trigger_id,router_id,enabled,initial_load_enabled,target_base_dir, conflict_strategy,create_time,last_update_by,last_update_time) VALUES ('sync_directory','server_2_client',1,1,'','SOURCE_WINS',current_timestamp, 'example',current_timestamp); INSERT INTO sym_router (router_id,target_catalog_name,target_schema_name,target_table_name, source_node_group_id,target_node_group_id, router_type,router_expression,sync_on_update,sync_on_insert,sync_on_delete, create_time,last_update_by,last_update_time) VALUES ('server_2_client',null,null,null,'server','client','default',null,1,1,1, current_timestamp,'example',current_timestamp);
The following example is also for a configuration with client and server
node groups. This example monitors the /filesync/server/nodes
directory.
It expects the directory to contain subdirectories that are named by the node_ids
in the client group. Any files put directly into a folder with the name
of the node will be routed to that node.
Note that the router is a Section 3.6.2, “Column Match Router” that is matching the client node_id with the value of the RELATIVE_DIR column in FILE_SNAPSHOT. Because the router is looking for an exact match any files in subdirectories would result in a path of node_id/subdir which would not match.
INSERT INTO sym_file_trigger (trigger_id,base_dir,recurse,includes_files,excludes_files,sync_on_create, sync_on_modified,sync_on_delete,sync_on_ctl_file,delete_after_sync,before_copy_script,after_copy_script,create_time, last_update_by,last_update_time) VALUES ('node_specific','/filesync/server/nodes',1,null,null,1,1,1,0,0,'',null, current_timestamp,'example',current_timestamp); INSERT INTO sym_file_trigger_router (trigger_id,router_id,enabled,initial_load_enabled,target_base_dir, conflict_strategy,create_time,last_update_by,last_update_time) VALUES ('node_specific','router_files_to_node',1,1,'/filesync/clients','SOURCE_WINS', current_timestamp,'example',current_timestamp); INSERT INTO sym_router (router_id,target_catalog_name,target_schema_name,target_table_name, source_node_group_id,target_node_group_id,router_type,router_expression, sync_on_update,sync_on_insert,sync_on_delete,create_time,last_update_by, last_update_time) VALUES ('router_files_to_node',null,null,null,'server','client','column', 'RELATIVE_DIR = :NODE_ID ',1,1,1,current_timestamp,'example', current_timestamp);
Routers provided in the base implementation currently include:
The mapping between the set of triggers and set of routers is many-to-many. This means that one trigger can capture changes and route to multiple locations. It also means that one router can be defined an associated with many different triggers.
The simplest router is a router that sends all the data that is captured by its associated triggers to all the nodes that belong to the target node group defined in the router. A router is defined as a row in the ROUTER table. It is then linked to triggers in the TRIGGER_ROUTER table.
The following SQL statement defines a router that will send data from the 'corp' group to the 'store' group.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, create_time, last_update_time) values ('corp-2-store','corp', 'store', current_timestamp, current_timestamp);
The following SQL statement maps the 'corp-2-store' router to the item trigger.
insert into SYM_TRIGGER_ROUTER (trigger_id, router_id, initial_load_order, create_time, last_update_time) values ('item', 'corp-2-store', 1, current_timestamp, current_timestamp);
Sometimes requirements may exist that require data to be routed based on
the current value or the old value of a column in the table that is
being routed. Column routers are configured by setting the
router_type
column on the
ROUTER
table to
column
and setting the
router_expression
column to an equality expression that represents the expected value of
the column.
The first part of the expression is always the column name. The column name should always be defined in upper case. The upper case column name prefixed by OLD_ can be used for a comparison being done with the old column data value.
The second part of the expression can be a constant value, a token that represents another column, or a token that represents some other SymmetricDS concept. Token values always begin with a colon (:).
Consider a table that needs to be routed to all nodes in the target group only when a status column is set to 'READY TO SEND.' The following SQL statement will insert a column router to accomplish that.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-ok','corp', 'store', 'column', 'STATUS=READY TO SEND', current_timestamp, current_timestamp);
Consider a table that needs to be routed to all nodes in the target group only when a status column changes values. The following SQL statement will insert a column router to accomplish that. Note the use of OLD_STATUS, where the OLD_ prefix gives access to the old column value.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-status','corp', 'store', 'column', 'STATUS!=:OLD_STATUS', current_timestamp, current_timestamp);
Consider a table that needs to be routed to only nodes in the target group whose STORE_ID column matches the external id of a node. The following SQL statement will insert a column router to accomplish that.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-id','corp', 'store', 'column', 'STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
Attributes on a NODE that can be referenced with tokens include:
Captured EXTERNAL_DATA is also available for routing as a virtual column.
Consider a table that needs to be routed to a redirect node defined by its external id in the REGISTRATION_REDIRECT table. The following SQL statement will insert a column router to accomplish that.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-redirect','corp', 'store', 'column', 'STORE_ID=:REDIRECT_NODE', current_timestamp, current_timestamp);
More than one column may be configured in a router_expression. When more than one column is configured, all matches are added to the list of nodes to route to. The following is an example where the STORE_ID column may contain the STORE_ID to route to or the constant of ALL which indicates that all nodes should receive the update.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-multiple-matches','corp', 'store', 'column', 'STORE_ID=ALL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
The NULL keyword may be used to check if a column is null. If the column is null, then data will be routed to all nodes who qualify for the update. This following is an example where the STORE_ID column is used to route to a set of nodes who have a STORE_ID equal to their EXTERNAL_ID, or to all nodes if the STORE_ID is null.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-multiple-matches','corp', 'store', 'column', 'STORE_ID=NULL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
A lookup table may contain the id of the node where data needs to be
routed. This could be an existing table or an ancillary table that is
added specifically for the purpose of routing data. Lookup table routers
are configured by setting the
router_type
column on the
ROUTER
table to
lookuptable
and setting a list of configuration parameters in the
router_expression
column.
Each of the following configuration parameters are required.
This is the name of the lookup table.
This is the name of the column on the table that is being routed. It will be used as a key into the lookup table.
This is the name of the column that is the key on the lookup table.
This is the name of the column that contains the external_id of the node to route to on the lookup table.
Note that the lookup table will be read into memory and cached for the duration of a routing pass for a single channel.
Consider a table that needs to be routed to a specific store, but the data in the changing table only contains brand information. In this case, the STORE table may be used as a lookup table.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-ok','corp', 'store', 'lookuptable', 'LOOKUP_TABLE=STORE KEY_COLUMN=BRAND_ID LOOKUP_KEY_COLUMN=BRAND_ID EXTERNAL_ID_COLUMN=STORE_ID', current_timestamp, current_timestamp);
Sometimes routing decisions need to be made based on data that is not in
the current row being synchronized. A 'subselect' router can be used in
these cases. A 'subselect' is configured with a
router_expression
that is a SQL select statement which returns a result set of the node
ids that need routed to. Column tokens can be used in the SQL expression
and will be replaced with row column data. The overhead of using this
router type is high because the 'subselect' statement runs for each row
that is routed. It should not be used for tables that have a lot of rows
that are updated. It also has the disadvantage that if the data being
relied on to determine the node id has been deleted before routing takes
place, then no results would be returned and routing would not happen.
The
router_expression
you specify is appended to the following SQL statement in order to
select the node ids:
select c.node_id from sym_node c where c.node_group_id=:NODE_GROUP_ID and c.sync_enabled=1 and ...
As you can see, you have access to information about the node currently
under consideration for routing through the 'c' alias, for example
c.external_id
. There are two node-related tokens you can use in your expression:
Column names representing data for the row in question are prefixed with
a colon as well, for example:
:EMPLOYEE_ID
, or
:OLD_EMPLOYEE_ID
. Here, the OLD_ prefix indicates the value before the change in cases
where the old data has been captured.
For an example, consider the case where an Order table and an OrderLineItem table need to be routed to a specific store. The Order table has a column named order_id and STORE_ID. A store node has an external_id that is equal to the STORE_ID on the Order table. OrderLineItem, however, only has a foreign key to its Order of order_id. To route OrderLineItems to the same nodes that the Order will be routed to, we need to reference the master Order record.
There are two possible ways to solve this in SymmetricDS. One is to
configure a 'subselect' router_type on the
ROUTER
table, shown below (The other possible approach is to use an
external_select
to capture the data via a trigger for use in a column match router,
demonstrated in
Section 3.6.7, “Utilizing External Select when Routing”
).
Our solution utilizing subselect compares the external id of the current node with the store id from the Order table where the order id matches the order id of the current row being routed:
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store','corp', 'store', 'subselect', 'c.external_id in (select STORE_ID from order where order_id=:ORDER_ID)', current_timestamp, current_timestamp);
As a final note, please note in this example that the parent row in Order must still exist at the moment of routing for the child rows (OrderLineItem) to route, since the select statement is run when routing is occurring, not when the change data is first captured.
When more flexibility is needed in the logic to choose the nodes to route to, then the a scripted router may be used. The currently available scripting language is Bean Shell. Bean Shell is a Java-like scripting language. Documentation for the Bean Shell scripting language can be found at http://www.beanshell.org .
The router_type for a Bean Shell scripted router is 'bsh'. The router_expression is a valid Bean Shell script that:
targetNodes
collection which is bound to the script
Also bound to the script evaluation is a list of
nodes
. The list of
nodes
is a list of eligible
org.jumpmind.symmetric.model.Node
objects. The current data column values and the old data column values
are bound to the script evaluation as Java object representations of the
column data. The columns are bound using the uppercase names of the
columns. Old values are bound to uppercase representations that are
prefixed with 'OLD_'.
If you need access to any of the SymmetricDS services, then the instance
of
org.jumpmind.symmetric.ISymmetricEngine
is accessible via the bound
engine
variable.
In the following example, the node_id is a combination of STORE_ID and WORKSTATION_NUMBER, both of which are columns on the table that is being routed.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-bsh','corp', 'store', 'bsh', 'targetNodes.add(STORE_ID + "-" + WORKSTATION_NUMBER);', current_timestamp, current_timestamp);
The same could also be accomplished by simply returning the node id. The last line of a bsh script is always the return value.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-bsh','corp', 'store', 'bsh', 'STORE_ID + "-" + WORKSTATION_NUMBER', current_timestamp, current_timestamp);
The following example will synchronize to all nodes if the FLAG column has changed, otherwise no nodes will be synchronized. Note that here we make use of OLD_, which provides access to the old column value.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-flag-changed','corp', 'store', 'bsh', 'FLAG != null && !FLAG.equals(OLD_FLAG)', current_timestamp, current_timestamp);
The next example shows a script that iterates over each eligible node and checks to see if the trimmed value of the column named STATION equals the external_id.
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-trimmed-station','corp', 'store', 'bsh', 'for (org.jumpmind.symmetric.model.Node node : nodes) { if (STATION != null && node.getExternalId().equals(STATION.trim())) { targetNodes.add(node.getNodeId()); } }', current_timestamp, current_timestamp);
This router audits captured data by recording the change in an audit
table that the router creates and keeps up to date (as long as
auto.config.database
is set to true.) The router creates a table named the same as the table
for which data was captured with the suffix of _AUDIT. It will contain
all of the same columns as the original table with the same data types
only each column is nullable with no default values.
Three extra "AUDIT" columns are added to the table:
The following is an example of an audit router
insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, create_time, last_update_time) values ('audit_at_corp','corp', 'local', 'audit', current_timestamp, current_timestamp);
The audit router captures data for a group link. For the audit router to work it must be associated with a node_group_link with an action of type 'R'. The 'R' stands for 'only routes to'. In the above example, we refer to a 'corp to local' group link. Here, local is a new node_group created for the audit router. No nodes belong to the 'local' node_group. If a trigger linked to an audit router fires on the corp node, a new audit table will be created at the corp node with the new data inserted.
There may be times when you wish to route based on a piece of data that
exists in a table other than the one being routed. The approach, first
discussed in
Section 3.6.4, “Subselect Router”
, is to utilize an
external_select
to save away data in
external_data
, which can then be referenced during routing.
Reconsider subselect's Order / OrderLineItem example (found in Section 3.6.4, “Subselect Router” ), where routing for the line item is accomplished by linking to the "header" Order row. As an alternate way of solving the problem, we will now use External Select combined with a column match router.
In this version of the solution, the STORE_ID is captured from the Order table in the EXTERNAL_DATA column when the trigger fires. The router is configured to route based on the captured EXTERNAL_DATA to all nodes whose external id matches the captured external data.
insert into SYM_TRIGGER (trigger_id,source_table_name,channel_id,external_select, last_update_time,create_time) values ('orderlineitem', 'orderlineitem', 'orderlineitem','select STORE_ID from order where order_id=$(curTriggerValue).$(curColumnPrefix)order_id', current_timestamp, current_timestamp); insert into SYM_ROUTER (router_id, source_node_group_id, target_node_group_id, router_type, router_expression, create_time, last_update_time) values ('corp-2-store-ext','corp', 'store', 'column', 'EXTERNAL_DATA=:EXTERNAL_ID', current_timestamp, current_timestamp);
The following variables can be used with the external select:
Variable to be replaced with the NEW or OLD column alias provided by the trigger context, which is platform specific. For insert and update triggers, the NEW alias is used; for delete triggers, the OLD alias is used. For example, "$(curTriggerValue).COLUMN" becomes ":new.COLUMN" for an insert trigger on Oracle.
Variable to be replaced with the NEW_ or OLD_ column prefix for platforms that don't support column aliases. This is currently only used by the H2 database. All other platforms will replace the variable with an empty string. For example "$(curColumnPrefix)COLUMN" becomes "NEW_COLUMN" on H2 and "COLUMN" on Oracle.
The advantage of this approach over the 'subselect' approach is that it guards against the (somewhat unlikely) possibility that the master Order table row might have been deleted before routing has taken place. This external select solution also is a bit more efficient than the 'subselect' approach, although the triggers produced do run the extra external_select SQL inline with application database updates.
Conflict detection and resolution is new as of SymmetricDS 3.0. Conflict detection is the act of determining if an insert, update or delete is in "conflict" due to the target data row not being consistent with the data at the source prior to the insert/update/delete. Conflict resolution is the act of figuring out what to do when a conflict is detected.
Conflict detection and resolution strategies are configured in the CONFLICT table. They are configured at minimum for a specific NODE_GROUP_LINK . The configuration can also be specific to a CHANNEL and/or table.
Conflict detection is configured in the
detect_type
and
detect_expression
columns of
CONFLICT
. The value for
detect_expression
depends on the
detect_type
. Conflicts are detected while data is being loaded into a target system.
Indicates that only the primary key is used to detect a conflict. If a row exists with the same primary key, then no conflict is detected during an update or a delete. Updates and deletes rows are resolved using only the primary key columns. If a row already exists during an insert then a conflict has been detected.
Indicates that all of the old data values are used to detect a conflict. Old data is the data values of the row on the source system prior to the change. If a row exists with the same old values on the target system as they were on the source system, then no conflict is detected during an update or a delete. If a row already exists during an insert then a conflict has been detected.
Note that some platforms do not support comparisons of binary columns. Conflicts in binary column values will not be detected on the following platforms: DB2, DERBY, ORACLE, and SQLSERVER.
Indicates that the primary key plus any data that has changed on the source system will be used to detect a conflict. If a row exists with the same old values on the target system as they were on the source system for the columns that have changed on the source system, then no conflict is detected during an update or a delete. If a row already exists during an insert then a conflict has been detected.
Note that some platforms do not support comparisons of binary columns. Conflicts in binary column values will not be detected on the following platforms: DB2, DERBY, ORACLE, and SQLSERVER.
The detect_expression can be used to exclude certain column names from being used. In order to
exclude column1 and column2, the expression would
be: excluded_column_names=column1,column2
Indicates that the primary key plus a timestamp column (as configured in
detect_expression
) will indicate whether a conflict has occurred. If the target timestamp column is not equal to the
old source timestamp column, then a conflict has been detected. If a row already exists during an
insert then a conflict has been detected.
Indicates that the primary key plus a version column (as configured in
detect_expression
) will indicate whether a conflict has occurred. If the target version column is not equal to the old
source version column, then a conflict has been detected. If a row already exists during an insert
then a conflict has been detected.
Be aware that conflict detection will not detect changes to binary columns in
the case where use_stream_lobs
is true in the trigger for the table. In addition, some
databases do not allow comparisons of binary columns whether use_stream_lobs
is true or not.
The choice of how to resolve a detected conflict is configured via the resolve_type
column. Depending on the setting, two additional boolean settings
may also be configured, namely resolve_row_only
and resolve_changes_only
, as discussed in the resolution settings below.
Indicates that when a conflict is detected the system should automatically apply the changes anyways.
If the source operation was an insert, then an update will be attempted. If the source operation was
an update and the row does not exist, then an insert will be attempted. If the source operation was a
delete and the row does not exist, then the delete will be ignored. The
resolve_changes_only
flag controls whether all columns will be updated or only columns that have changed will be updated
during a fallback operation.
Indicates that when a conflict is detected the system should automatically ignore the incoming
change. The
resolve_row_only
column controls whether the entire batch should be ignore or just the row in conflict.
Indicates that when a conflict is detected the batch will remain in error until manual intervention
occurs. A row in error is inserted into the
INCOMING_ERROR
table. The conflict detection id that detected the conflict is recorded (i.e., the conflict_id
value from
CONFLICT), along with the old data, new data, and the "current data"
(by current data, we mean the unexpected data at the target which doesn't match the old data as expected)
in columns old_data, new_data,
and cur_data
.
In order to resolve, the
resolve_data
column can be manually filled out which will be used on the next load attempt instead of the original
source data. The
resolve_ignore
flag can also be used to indicate that the row should be ignored on the next load attempt.
Indicates that when a conflict is detected by USE_TIMESTAMP or USE_VERSION that the either the
source or the target will win based on the which side has the newer timestamp or higher version
number. The
resolve_row_only
column controls whether the entire batch should be ignore or just the row in conflict.
For each configured conflict, you also have the ability to control if and how much "resolved" data is sent back to the node who's data change is in conflict. This "ping back" behavior
is specified by the setting of the ping_back
column and can be one of the following values:
No data is sent back to the originating node, even if the resolved data doesn't match the data the node sent.
The resolved data of the single row in the batch that caused the conflict is sent back to the originating node.
The resolved data of the single row in the batch in conflict, along with the entire remainder of the batch, is sent back to the originating node.
New as of SymmetricDS 2.4, SymmetricDS is now able to transform synchronized data by way of configuration (previously, for most cases a custom data loader would need to have been written). This transformation can take place on a source node or on a target node, as the data is being loaded or extracted. With this new feature you can, for example:
Copy a column from a source table to two (or more) target table columns,
Merge columns from two or more source tables into a single row in a target table,
Insert constants in columns in target tables based on source data synchronizations,
Insert multiple rows of data into a single target table based on one change in a source table,
Apply a Bean Shell script to achieve a custom transform when loading into the target database.
These transformations can take place either on the target or on the source, and as data is either being extracted or loaded. In either case, the transformation is initiated due to existence of a source synchronization trigger. The source trigger creates the synchronization data, while the transformation configuration decides what to do with the synchronization data as it is either being extracted from the source or loaded into the target. You have the flexibility of defining different transformation behavior depending on whether the source change that triggered the synchronization was an Insert, Update, or Delete. In the case of Delete, you even have options on what exactly to do on the target side, be it a delete of a row, setting columns to specific values, or absolutely nothing at all.
A few key concepts are important to keep in mind to understand how SymmetricDS performs transformations. The first concept is that of the "source operation" or "source DML type", which is the type of operation that occurred to generate the synchronization data in the first place (i.e., an insert, a delete, or an update). Your transformations can be configured to act differently based on the source DML type, if desired. When transforming, by default the DML action taken on the target matches that of the action taken on the row in the source (although this behavior can be altered through configuration if needed). If the source DML type is an Insert, for example, the resulting transformation DML(s) will be Insert(s).
Another important concept is the way in which transforms are applied. Each source operation may map to one or more transforms and result in one or more operations on the target tables. Each of these target operations are performed as independent operations in sequence and must be "complete" from a SQL perspective. In other words, you must define columns for the transformation that are sufficient to fill in any primary key or other required data in the target table if the source operation was an Insert, for example.
Please note that the transformation engine relies on a source trigger / router existing to supply the source data for the transformation. The transform configuration will never be used if the source table and target node group does not have a defined trigger / router combination for that source table and target node group.
SymmetricDS stores its transformation configuration in two configuration tables, TRANSFORM_TABLE and TRANSFORM_COLUMN . Defining a transformation involves configuration in both tables, with the first table defining which source and destination tables are involved, and the second defining the columns involved in the transformation and the behavior of the data for those columns. We will explain the various options available in both tables and the various pre-defined transformation types.
To define a transformation, you will first define the source table and target table that applies to a particular transformation. The source and target tables, along with a unique identifier (the transform_id column) are defined in TRANSFORM_TABLE . In addition, you will specify the source_node_group_id and target_node_group_id to which the transform will apply, along with whether the transform should occur on the Extract step or the Load step (transform_point). All of these values are required.
Three additional configuration settings are also defined at the source-target table level: the order of the transformations, the behavior when deleting, and whether an update should always be attempted first. More specifically,
For each transformation defined in TRANSFORM_TABLE , the columns to be transformed (and how they are transformed) are defined in TRANSFORM_COLUMN . This column-level table typically has several rows for each transformation id, each of which defines the source column name, the target column name, as well as the following details:
There are several pre-defined transform types available in SymmetricDS.
Additional ones can be defined by creating and configuring an extension
point which implements the
IColumnTransform
interface. The pre-defined transform types include the following (the
transform_type entry is shown in parentheses):
system_date
is the current system date,
system_timestamp
is the current system date and time,
source_node_id
is the node id of the source,
target_node_id
is the node id of the target,
null
is a null value, and old_column_value
is the column's old value prior to the DML operation.
n
, the beginning index), or a pair of comma-separated integers (
n,m
- the beginning and ending index). The transform behaves as the Java
substring function would using the specified values in
transform_expression.
COLUMN_NAME
is a variable for a source column in the row, where the variable name is
the column name in uppercase;
currentValue
is the value of the current source column;
oldValue
is the old value of the source column for an updated row;
sqlTemplate
is a
org.jumpmind.db.sql.ISqlTemplate
object for querying or updating the database;
channelId
is a reference to the channel on which the transformation is happening;
sourceNode
is a
org.jumpmind.symmetric.model.Node
object that represents the node from where the data came;
targetNode
is a
org.jumpmind.symmetric.model.Node
object that represents the node where the data is being loaded.
#{COLUMN_NAME}
is a variable for a source column in the row, where the variable name
is the column name in uppercase;
#{currentValue}
is the value of the current source column;
#{oldValue}
is the old value of the source column for an updated row.
New as of SymmetricDS 3.1, SymmetricDS is now capable of taking actions upon the load of certain data via configurable load filters. This new configurable option is in additon to the already existing option of writing a class that implements IDatabaseWriterFilter . A configurable load filter watches for specific data that is being loaded and then takes action based on the load of that data.
Specifying which data to action is done by specifying a souce and target node group (data extracted from this node group, and loaded into that node group), and a target catalog, schema and table name. You can decide to take action on rows that are inserted, updated and/or deleted, and can also further delineate which rows of the target table to take action on by specifying additional criteria in the bean shell script that is executed in response to the loaded data. As an example, old and new values for the row of data being loaded are available in the bean shell script, so you can action rows with a certain column value in old or new data.
The action taken is based on a bean shell script that you can provide as part of the configuration. Actions can be taken at different points in the load process including before write, after write, at batch complete, at batch commit and/or at batch rollback.
SymmetricDS stores its load filter configuration in a single table called LOAD_FILTER . The load filter table allows you to specify the following:
As part of the bean shell load filters, SymmetricDS provides certain variables for use in the bean shell script. Those variables include:
The following is an example of a load filter that watches a table named TABLE_TO_WATCH being loaded from the Server Node Group to the Client Node Group for inserts or updates, and performs an initial load on a table named "TABLE_TO_RELOAD" for KEY_FIELD on the reload table equal to a column named KEY_FIELD on the TABLE_TO_WATCH table.
insert into sym_load_filter (LOAD_FILTER_ID, LOAD_FILTER_TYPE, SOURCE_NODE_GROUP_ID, TARGET_NODE_GROUP_ID, TARGET_CATALOG_NAME, TARGET_SCHEMA_NAME, TARGET_TABLE_NAME, FILTER_ON_UPDATE, FILTER_ON_INSERT, FILTER_ON_DELETE, BEFORE_WRITE_SCRIPT, AFTER_WRITE_SCRIPT, BATCH_COMPLETE_SCRIPT, BATCH_COMMIT_SCRIPT, BATCH_ROLLBACK_SCRIPT, HANDLE_ERROR_SCRIPT, CREATE_TIME, LAST_UPDATE_BY, LAST_UPDATE_TIME, LOAD_FILTER_ORDER, FAIL_ON_ERROR) values ('TABLE_TO_RELOAD','BSH','Client','Server',NULL,NULL, 'TABLE_TO_WATCH',1,1,0,null, 'engine.getDataService().reloadTable(context.getBatch().getSourceNodeId(), table.getCatalog(), table.getSchema(), "TABLE_TO_RELOAD","KEY_FIELD=''" + KEY_FIELD + "''");' ,null,null,null,null,sysdate,'userid',sysdate,1,1);
As you probably know by now, SymmetricDS stores its single configuration centrally and distributes it to all nodes. By default, a trigger-router is in effect for all nodes in the source node group or target node group. Triggers will be established on each node that is a member of the source node, and changes will be routed to all relevant nodes that are members of the target node group. If, for example, the router routes to "all" nodes, "all" means every node that is in the target node group. This is the default behavior of SymmetricDS.
Once in production, however, you will likely find you need or want to make configuration changes to triggers and routers as new features are rolled out to your network of SymmetricDS nodes. You may, for example, wish to "pilot" a new configuration, containing new synchronizations, only on specific nodes initially, and then increase the size of the pilot over time. SymmetricDS' does provide the ability to specify that only particular trigger-router combinations are applicable to particular nodes for this purpose. It does this by allowing you to define an arbitray collection of nodes, called a "grouplet", and then choosing which trigger-routers apply to the normal set of nodes (the default behavior) and which apply just to nodes in one or more "grouplets". This allows you, essentially, to filter the list of nodes that would otherwise be included as source nodes and/or target nodes. Through the use of grouplets, you can, for example, specify a subset of nodes on which a given trigger would be created. It also allows you to specify a subset of the normal set of nodes a change would be routed to. This behaviour is in addition to, and occurs before, any subsetting or filtering the router might otherwise do.
In its simplest form, a grouplet is just an arbitrary collection of nodes. To define a grouplet, you start by creating a grouplet with a unique id, a description, and a link policy,
as defined in GROUPLET. To defined which nodes are members of (or are not members of) a grouplet, you provide a list of external ids of the nodes
in GROUPLET_LINK. How those external ids are used varies based on the grouplet link policy.
The grouplet_link_policy
can be either I or E, representing an "inclusive" list of nodes or an "exclusive" list of
nodes, respectively. In the case of "inclusive", you'll be listing each external id to be included in the grouplet. In the case of exclusive, all nodes will be included in
the grouplet except ones which have an external id in the list of external ids.
Once you have defined your grouplet and which nodes are members of a grouplet, you can tie a grouplet to a given trigger-router through
the use of TRIGGER_ROUTER_GROUPLET.
If a particular trigger-router does not appear in this table, SymmetricDS behaves as normal.
If, however, an entry for a particular trigger-router appears in this table, the default behavior is overridden based on the grouplet_id
and applies_when
settings.
The grouplet id provides the node list, and the applies_when
indicates whether the grouplet nodes are to be used to filter the source node list, the target node list,
or both (settings are "S", "T", and "B", respectively). Nodes that survive the filtering process on as a source will have a trigger defined, and nodes that survive the filtering process
as a target are eligible nodes that can be routed to.
At this point, an example would probably be useful. Picture the case where you have 100 retail stores (each containing one database, and each a member of the "store" node group) and a central office database (external id of corp, and a member of the "corp" node group ). You wish to pilot two new trigger and routers for a new feature on your point-of-sale software (one which moves data from corp to store, and one which moves data from store to corp), but you only want the triggers to be installed on 10 specific stores that represent your "pilot" stores. In this case, the simplest approach would be to define a grouplet with, say, a grouplet id of "pilot". We'd use a grouplet link policy of "inclusive", and list each of the 10 external ids in the GROUPLET_LINK table.
For the trigger-router meant to send data from corp to store, we'd create an entry in TRIGGER_ROUTER_GROUPLET for
our grouplet id of "pilot", and we'd specify "T" (target) as the applies-when setting. In this way, the source node list is not filtered, but the target node list used during routing
will filter the potential target nodes to just our pilot stores. For the trigger-router meant to send data from a pilot store back to corp, we would have the grouplet apply when
the node is in the source node list (i.e., applies_when
will be "S"). This will cause the trigger to only be created for stores in the pilot list and not other stores.
An important thing to mention in this example: Since your grouplet only included the store nodes, you can't simply specify "both" for the applies when setting. For the corp-to-store trigger, for example, if you had said "both", no trigger would have been installed in corp since the grouplet nodes represent all possible source nodes as well as target nodes, and "corp" is not in the list! The same is true for the store to corp trigger-router as well. You could, however, use "both" as the applies when if you had included the "corp" external id in with the list of the 10 pilot store external ids.
Parameters can be used to help tune and configure your SymmetricDS configuration. Parameters can be set for an individual node or for all nodes in your network.
See Appendix B, Parameters, for a complete list of parameters.