November 25, 2015

Illustrating the implementation of IBM Informix TimeSeries Swing Door (Linear) Compression Algorithm


It's proven that, IBM Informix TimeSeries feature efficiently handles time stamped data sets when compared to other Relational databases. In best cases, the TimeSeries implementation gives you over 50% of disk space savings over other relational databases.

IBM Informix TimeSeries allows you to further compress this time stamped data, provided that the datasets are being recorded at REGULAR intervals and are NUMERIC in nature. Say, for example, you can compress an 8-byte BIGINT value down to 1 byte, with some loss of precision. That's a huge saving, isn't it?

One can compress TimeSeries data using one of the following algorithms:

1. Quantization : The Quantization algorithm helps create discrete grids, representing a range of values and then divides the values into these grids, based on the upper and lower boundaries. Fewer bytes are needed to represent a grid than a numeric value, resulting in compression, however, this mode of compression can be lossy.
2. Linear            : As against the grids maintained by Quantization method, the Linear algorithm represents the values within line segments, defined by two end points. The algorithm does not record a value if it falls within the agreed deviation. If the value deviates too much from the last recorded value, only then will the new value be recorded.

IBM Informix TimeSeries provides two variants of Linear Compression algorithms - BoxCar and Swing Door, to achieve fast Read/Write performance and High Compression, respectively.

In this blog work 'Illustrating the implementation of IBM Informix TimeSeries Swinging Door Compression Algorithm', as the title indicates, we shall work on implementing 'Swing Door' algorithm, a variant of Linear Compression method.

Be informed that, the implementation of the Swing Door algorithm needs to be done while initializing the TimeSeries table. 

Example:
insert into test_ts values('tag1',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));

Where:
ls          - Indicates that Swing Door algorithm is being applied to the first column after the time stamp, i.e the column following the DATETIME column (In Row type), which is of numeric data type. In the sample code shared in this blog work, the algorithm is applied to a column with FLOAT type. 
(2)    - Indicates the Accepted Deviation, indicating that, if the difference between the current value and previous value is less than 2, then the record is not stored and if the differential value exceeds 2, only then is the record stored.

For the ease of understanding and usage, the implementation steps have been organized as part of a simple shell script to help understand the flow, in terms of working with basic objects of TimeSeries viz. Calendar, Container, TimeSeries, Virtual Table Interface (VTI) and SQL API. It takes no more than 10min to run the shell script and understand Informix TimeSeries syntax with along with the implementation of Swing Door Compression Algorithm.

1.      The shell script is well tested on Linux and might need minor changes on Unix platform. It won’t work on Windows platform. However, you can copy paste SQL statements high lighted in blue in following section and execute on windows manually.
2.      Ensure that Informix engine is online. You can use “onstat –“  command
3.      Download the shell script available in the zip file (Or Copy - Paste the following link on to the browser:
https://drive.google.com/open?id=0B4mxiMcgUauaMENkNVdxZXJRQlU )
4.      Unzip the file on linux/Unix machine under some test directory
5.      You might need to execute dos2unix command for each file before execution
6.      Ensure that 'main.sh' and 'func.sh' have Execute Privileges
7.      Execute main.sh and just follow the instructions.

Output of shell script should look like this :

################################################################
Welcome to TimeSeries demo.
This program has sections that demonstrate capabilities of TimeSeries.
Flow in each section is Setup, Data Load & Data Retrieval
please enter your choice of TimeSeries demo
 1      : TimeSeries Demo
 2      : JSON Demo
 3      : TimeSeries with JSON Demo
 4      : TimeSeries with Rolling Window Concept
 5      : TimeSeries with Hertz Frequency Concept
 6      : TimeSeries with Swinging Door Compression Algorithm

6

#########################################################################
 You have chosen a Demo on Informix TimeSeries with Swinging Door Compression
#########################################################################

Please read the options carefully :-
If you are executing the TimeSeries Demo for the first time, then
        Press '1'       : To Perform the Setup and create the Demo Environment

If you have already performed the setup for TimeSeries Demo, then
        Press ANY NUMBER: To Execute Queries against the prepared Setup

1

Cleaning up old files...
Clean up is over. Please presss Enter to continue

################################################################
Creating dbspace dbspace1

execute function admin('create dbspace','dbspace1',
'/home/amprasan/IDS1210/storage/dbspace1','100 MB','0')
45107280         6        0x60001    6        1        2048     N  BA    informix dbspace1
471ba028         6      6      0          10000      4851                  PO-B-D /home/amprasan/IDS1210/storage/dbspace1

dbspace creation .. passed
Please press Enter to continue

################################################################
Creating database demo_db

drop database if exists demo_db;
create database demo_db in dbspace1 with buffered log

database creation .. passed
Please press Enter to continue

################################################################
Creating row type row1

create row type row1
(
 time_stamp datetime year to fraction(5),
 value float
);

Row creation .. passed
Please press Enter to continue

################################################################
Creating calendars calsec, cal1min and cal15min

execute procedure ifx_allow_newline('t');
insert into calendarpatterns values
(
 'sec','{1 on},second'
);
insert into calendarpatterns values
(
 'min','{1 on},minute'
);
insert into calendarpatterns values
(
 'min15','{1 on ,14 off},minute'
);
insert into CalendarTable
(
 c_name,c_calendar
)
values ('calmin','startdate(2015-07-01 00:00:00.00000),
pattstart(2015-07-01 00:00:00.0000),pattname(min)');

insert into CalendarTable
(
 c_name,c_calendar
)
values ('calsec','startdate(2015-07-01 00:00:00.00000),
pattstart(2015-07-01 00:00:00.0000),pattname(sec)');

insert into CalendarTable
(
 c_name,c_calendar
)
values ('cal15min','startdate(2015-07-01 00:00:00.00000),
pattstart(2015-07-01 00:00:00.0000),pattname(min15)');

Calendar creation .. passed
Please press Enter to continue

################################################################
Creating table test_ts

create table test_ts
 (
   tagid char(20),
   series timeseries(row1),
   PRIMARY KEY (tagid)
) in dbspace1;
alter table test_ts
lock mode(row);

Table creation .. passed
Please press Enter to continue

################################################################
Creating container tscont1

execute procedure tscontainercreate('tscont1', 'dbspace1','row1',1024,1024);

Container creation .. passed
Please press Enter to continue

################################################################
Creating virtual table test_v

execute procedure TSCreateVirtualTab('test_v','test_ts',4096);

Virtual table creation .. passed
Please press Enter to continue


################################################################
Implementation of the Swing Door algorithm needs to be done while initializing the TimeSeries table. Here, in the following statement:
ls - Indicates usage of Swing Door algorithm. In this case, it is being applied to the first column after the time stamp, i.e the column following the DATETIME column (In Row type), which is of numeric data type.
(2)    - Indicates the Accepted Deviation, indicating that, if the difference between the current value and previous value is less than 2, then the record is not stored and if the differential value exceeds 2, only then is the record stored.
################################################################
Initializing all tags tag1 to tag5

BEGIN;
insert into test_ts values('tag1',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));
insert into test_ts values('tag2',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));
insert into test_ts values('tag3',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));
insert into test_ts values('tag4',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));
insert into test_ts values('tag5',TSCreateIrr('calsec', '2015-07-01 00:00:00.00000','tscont1', 'ls(2)'));
COMMIT;


Tags initialization .. passed
Please press Enter to continue

################################################################
Load data for Tags 1 to 5 through VTI

load from sw_cmp_load.unl insert into test_v;

Loading data through VTI .. passed
Please press Enter to continue

Congratulations!!! Your setup is successful. Please press Enter to refresh the screen and be ready to see certain data retrieval queries

################################################################
Simple select on VTI table to see range of data

select * from test_v
where tagid='tag1'
and time_stamp between '2015-07-01 01:00:00.00000' and
'2015-07-01 01:10:00.00000';

Press enter to see output

Database selected.

tagid                time_stamp                         value

tag1                 2015-07-01 01:00:00.00000 108.0000000000
tag1                 2015-07-01 01:01:00.00000 109.0000000000
tag1                 2015-07-01 01:02:00.00000 100.0000000000
tag1                 2015-07-01 01:03:00.00000 101.0000000000
tag1                 2015-07-01 01:04:00.00000 102.0000000000
tag1                 2015-07-01 01:05:00.00000 103.0000000000
tag1                 2015-07-01 01:06:00.00000 104.0000000000
tag1                 2015-07-01 01:07:00.00000 105.0000000000
tag1                 2015-07-01 01:08:00.00000 106.0000000000
tag1                 2015-07-01 01:09:00.00000 107.0000000000
tag1                 2015-07-01 01:10:00.00000 108.0000000000

11 row(s) retrieved.

Database closed.

Press enter to continue

################################################################
Simple select on TimeSeries table to see range of data

select 'tag1',
clip(series,'2015-07-01 01:00:00.00000'::datetime year to fraction(5) ,
'2015-07-01 01:10:00.00000'::datetime year to fraction(5))
from test_ts where tagid='tag1';

Press enter to see output

Database selected.

(constant)    tag1
(expression)  origin(2015-07-01 01:00:00.00000), calendar(calsec), container(tscont1), threshold(0), irregular, [(108.0000000000)@2015-07-01 01:00:00.00000, (109.0000000000)@2015-07-01 01:01:00.00000, (100.0000000000)@2015-07-01 01:02:00.00000, (101.0000000000)@2015-07-01 01:03:00.00000, (102.0000000000)@2015-07-01 01:04:00.00000, (103.0000000000)@2015-07-01 01:05:00.00000, (104.0000000000)@2015-07-0101:06:00.00000, (105.0000000000)@2015-07-01 01:07:00.00000, (106.0000000000)@2015-07-01 01:08:00.00000, (107.0000000000)@2015-07-01 01:09:00.00000, (108.0000000000)@2015-07-01 01:10:00.00000]

1 row(s) retrieved.

Database closed.

Press enter to continue

################################################################
Select on TimeSeries table and output in traditional relational form

select 'tag1',time_stamp,value from
table (( select tssettolist(clip(series,
'2015-07-01 01:00:00.00000'::datetime year to fraction(5),
'2015-07-01 01:10:00.00000'::datetime year to fraction(5)))
::list(row1 not null)
from test_ts
where tagid='tag1'));

Press enter to see output

Database selected.

(constant) time_stamp                         value

tag1       2015-07-01 01:00:00.00000 108.0000000000
tag1       2015-07-01 01:01:00.00000 109.0000000000
tag1       2015-07-01 01:02:00.00000 100.0000000000
tag1       2015-07-01 01:03:00.00000 101.0000000000
tag1       2015-07-01 01:04:00.00000 102.0000000000
tag1       2015-07-01 01:05:00.00000 103.0000000000
tag1       2015-07-01 01:06:00.00000 104.0000000000
tag1       2015-07-01 01:07:00.00000 105.0000000000
tag1       2015-07-01 01:08:00.00000 106.0000000000
tag1       2015-07-01 01:09:00.00000 107.0000000000
tag1       2015-07-01 01:10:00.00000 108.0000000000

11 row(s) retrieved.

Database closed.

Press enter to continue

################################################################
Roll up data to 15min cycle using aggregateby function

SELECT *
FROM TABLE (( SELECT TSSetToList(AggregateBy('avg($value)', 'cal15min',
series, 0, '2015-07-01 00:00'::datetime year to minute,
'2015-07-01 2:30'::datetime year to minute))::list(row1 not null)
FROM test_ts
WHERE tagid='tag1'));

Press enter to see output

Database selected.

time_stamp                         value

2015-07-01 00:15:00.00000 103.4000000000
2015-07-01 00:30:00.00000 104.9375000000
2015-07-01 00:45:00.00000 104.3125000000
2015-07-01 01:00:00.00000 104.9375000000
2015-07-01 01:15:00.00000 104.3125000000
2015-07-01 01:30:00.00000 104.9375000000
2015-07-01 01:45:00.00000 104.3125000000
2015-07-01 02:00:00.00000 104.9375000000
2015-07-01 02:15:00.00000 104.3125000000
2015-07-01 02:30:00.00000 104.9375000000

10 row(s) retrieved.

Database closed.

Press enter to continue

Congratulations!!! Your Demo is completed Successfully.
We hope you had a pleasant learning experience

Click Here to read further on Compression types available with Informix TimeSeries

Click Here to take a look at an illustrated example to implement various Compression Algorithms as part of TimeSeries implementation.

No comments: