datconv.filters package

Common data or structures to be used in Datconv Filters.

datconv.filters.SKIP = 0

Name for zero, when Filter returnes this value record is skipped.

datconv.filters.WRITE = 1

When Filter returns this bit, record is being posted to Writer.

datconv.filters.REPEAT = 2

When Filter returns this bit, other bits are checked and flow is again returned to Filter function with the same record. This bit is used to generate/produce data.

datconv.filters.BREAK = 4

When Filter returns this bit, impot process is breaken and DCWriter.writeFooter function is being called.

Filter interface

This module contain Datconv Filter skeleton class suitable as starting point for new filters.

class datconv.filters._skeleton.DCFilter[source]

Bases: object

This class must be named exactly DCFilter. It is being called after Reader read record and before Writer write record. It is able to:

  • filter data (i.e. do not pass certain records further - i.e. to writer for placing in output)
  • change data (i.e. change on the fly contents of certain records)
  • produce data (i.e. cause that certain records, maybe slightly modified, are being sent multiply times to writer)
  • break conversion process (i.e. caused that conversion stop on certain record).

Additional constructor parameters may be added to this method, but they all have to be named parameters. Parameters are usually passed from YAML file as subkeys of Filter:CArg key.

setHeader(header)[source]

Facultative method that may be defined in Filter class. Informs Filter about contents of header and give it a chance to change it. If this method is present in Filter it is called by Reader before data conversion begins and before Writer calls writeHeader.

Parameters:header – is instance of header as passed by Reader (always a list, but type of elements is up to Reader). This parameter is passed later to Writer.
filterRecord(record)[source]

Obligatory method that must be defined in Filter class. It is called to perform filter tasks described above.

Parameters:record – is instance of root XML model of record returned by Reader (class of lxml.etree.ElementTree).

This method may check or manipulate contents of record.

There are several ways to access already known data from current record, e.g.:
record.tag - the name of root tag (i.e. record name).
record.find(xpath) - returns first found (or None) record’s sub-tag using relative, simplified xpath.
e.g. record.find(‘.//TIME’) - searches record tree and returns first found <TIME …/> tag.
record.findtext(xpath) - as above but returns .text attribute (see below) of found tag (or raise Exception if tag is not found)..
record.xpath(xpath) - evaluate full absolute xpath expression on record (i.e. return list of matched tags, or string, number etc. - depands on xpath).
e.g. record.xpath(‘/Gampdf_winNbrs/winSet’) - returns list of all winSet subtags of root Gampdf_winNbrs tag.
On record and also on tags returned by above methods the data associated with tag may be obtained using:
tag.tag - tag name (i.e. field name)
tag.text - text that is contained between opening and closing tag (usually data value)
tag.keys() - iterable containing tag attbibute names
tag[‘attrib’] - the value of tag attribute named ‘attrib’; raise Exception if tag does not contain ‘attrib’ atribute.
tag.get(‘attrib’) - as above but returns None if no such attribute.
record.insert(0, newtag) - inserts new tag at begining of record
etree.SubElement(record, ‘NEWTAG’) - inserts new tag at and of record
See lxml package for more documentation.
This method should return combination of following bits:
WRITE - to cause program to pass record to Writer for writting to output
REPEAT - to cause program to call filterRecord with the same record (instead or reading next record from input). This is used to produce / create new records. This option should be used with caution to avoid infinite loop (i.e. Filter should mainain its own replication counter and stop returning REPEAT at some point).
BREAK - to cause program to break process on this record (i.e. Reader will not read next record). In case when REPEAT | BREAK is returned, the REPEAT bit takes precedence.
or return SKIP (0) - what will cause that record will be skiped (will not be passed to Writer).
setFooter(footer)[source]

Facultative method that may be defined in Filter class. Informs Filter about contents of footer and give it a chance to change it. If this method is present in Filter it is called by Reader after data conversion and before Writer calls setFooter.

Parameters:footer – is instance of footer as passed by Reader (always a list, but type of elements is up to Reader). This parameter is passed later to Writer.

datconv.filters.delfield module

General Filter that allows to remove certain fields from record.

class datconv.filters.delfield.DCFilter(field=[])[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:field – list of fields to remove. Fields must be in form of XPaths understandable by lxml.etree._Element.find method (relative paths)

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.filters.rectyp module

General Filter that allows to filter out certain record types.

class datconv.filters.rectyp.DCFilter(inclusive=True, rectyp=[])[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:
  • inclusive – if False, record types given in rectyp are excluded, otherwise only rectyp records are included;
  • rectyp – list of record types (root tags of records).

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.filters.pipe module

General Filter that allows users to run several other filters one after one. Values returned by configured filters’ are combined in following way:

  • to get record written (sent to Writer) all filters must set WRITE bit
  • to get record repeated at leat one filter must set REPEAT bit
  • to get process break at leat one filter must set BREAK bit
  • REPEAT bit takes precedence over BREAK bit (i.e. if both are set record is re-evaluated)
class datconv.filters.pipe.DCFilter(flist, pass_skiped=True)[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:
  • flist – list of filters to be run in chain with their parameters;
  • pass_skiped – if it is False, records for which some filter returned SKIP will not be passed to next filters;

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.filters.gen_rec module

General Filter that allows to generate new records. Every record is cloned by conifigurable number of times. This Filter is suitable for subclassing if more rebust generation strategies are required.

class datconv.filters.gen_rec.DCFilter(n=1, fake_flg=None)[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:
  • n – determines how many clones are generated for every record.
  • fake_flg – if set (to string) a tag of set name is added to every generated cloned record with the value 1.

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.filters.stat module

General Filter that allows to calculate and print required statistics about processed data. Filter prints first record number when given XPath expression is met and number of records in which it is met. Filter prints statistics at program exit as logger INFO messages.

class datconv.filters.stat.DCFilter(retval=1, rectyp=True, printzero=False, fields=[])[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:
  • retval – value that filter returns (0 to skip records, 1 to write records);
  • rectyp – if True, record type (root tag) is included into statistics; i.e. it is printed how many records are of particular types.
  • printzero – if True, not found records (with count 0) are included into summary (except when groupping is used)
  • fields – list of 2 elements’ lists:
    - first element is absolute XPath expression to make statistics against (lxml.etree._element.xpath method compatible)
    - second element is a digit:
    0 - if we test against element existance (i.e. not None and not [])
    1 - if we are grouping against element value
    2 - if given XPath expression returns boolean value.

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.filters.statex module

General Filter that allows to calculate and print required statistics about processed data - extended version. Filter prints counts or sums of records that fulfill given expression with option to group by certain data. Filter prints statistics at program exit as logger INFO messages or to the file.

class datconv.filters.statex.DCFilter(retval=1, fields=[], statfile=None, statwriter=None)[source]

Bases: object

Please see constructor description for more details.

Constructor parameters are usually passed from YAML file as subkeys of Filter:CArg key.

Parameters:
  • retval – value that filter returns (0 to skip records, 1 to write records);
  • fields – list of 5 or 6 elements’ lists that define calculated statistics.
  • statfile – file to write final statistics
  • statwriter – datconv writer module to write final statistics

For more detailed descriptions see conf_template.yaml file in this module folder.

Configuration keys

Listing of all possible configuration keys to be used with filters contained in this package.

There are sample values given, if key is not specified in configuration file, than default value is assumed.

Filter: 
    Module: datconv.filters.rectyp
    CArg:
        # If False, record types given in rectyp are excluded, otherwise only rectyp records are included.
        # default: true
        inclusive: true
        
        # List of record types (root tags of records).
        # default: []
        rectyp: []
        
Filter: 
    Module: datconv.filters.delfield
    CArg:
        # List of fields to remove.
        # Fields must be in form of XPaths understandable by lxml.etree._Element.find method (relative paths)
        # default: []
        field: []
        
Filter: 
    Module: datconv.filters.pipe
    CArg: 
        # List of filters to be run in chain with their parameters (obligatory parameter)
        flist:
            - Module: datconv.filters.rectyp
              CArg: 
                  rectyp: []
            - Module: datconv.filters.delfield
              CArg: 
                  field: []
        
        # If it is False, records for which some filter returned SKIP will not be passed to next filters
        # default: true
        pass_skiped: true
        
Filter: 
    Module: datconv.filters.gen_rec
    CArg:
        # Determines how many clones are generated for every record.
        # default: 1
        n: 5
        
        # If set (to string) a tag of set name is added to every generated clone with the value 1.
        # default: null
        fake_flg: FAKE
        
Filter: 
    Module: datconv.filters.stat
    CArg: 
        # Value that filter returns (0 to skip records, 1 to write records)
        # default: 1
        retval: 0
        
        # If true, record type (root tag) is included into statistics
        # i.e. it is printed how many records are of particular types.
        # default: true
        rectyp: true

        # If true, not found records (with count 0) are included into summary (except when groupping is used)
        # default: false
        printzero: false
        
        # List of 2 elements' lists:
        # first element is absolute XPath expression to make statistics against 
        # (lxml.etree._element.xpath method compatible)
        # second element is a digit:
        # 0 - if we test against element existance (i.e. not None and not [])
        # 1 - if we are grouping against element value
        # 2 - if given XPath expression returns boolean value.
        # default: []
        fields:
            - [/TT_COMMAND/PRODUCT, 1]
            - [/TT_WAGER/PRODUCT, 1]
            - [/TT_WAGER/PRODUCT=7, 2]
            - [/TT_COMMAND/WIN_CDC, 0]

Filter: 
    Module: datconv.filters.statex
    CArg: 
        # Value that filter returns (0 to skip records, 1 to write records)
        # default: 1
        retval: 1
        
        # List of 5 or 6 elements' lists:
        # 1st element is statistic name used only in output summary 
        # 2nd element is record name for which evaluate statistic, if null - eveluate for every record 
        # 3rd element is XPath expression or boolean; if it avaluate to non empty list, text, non zero numeric or true, statistic is updated;
        #             if it is true statistic is updated unconditionally; if false - never updated
        # 4th element is XPath expression used for grouping or null for global (all data) grouping
        # 5th element is either 'c' (count) or 's' (sum) small letter
        # 6th element is XPath expression that returns numeric or value castable to numeric;
        #             it determines what to sum if 5th element is 's', or has no meaning (may be absent) otherwise
        # All XPath expressions must be absolute and lxml.etree._element.xpath method compatible.
        # Note: only subset of full XPath specification is currently supported in lxml - check your version of this package
        # default: []
        fields:
            - [BAL.WAGCNT, DEFRECBAL, true, number(//PROD_NUM), s, //WAGCNT]
            - [TMF.WAGCNT, TT_WAGER, //_UPDATE_MONEY=1, number(//PRODUCT), c]
            - [BAL.WAGAMT, DEFRECBAL, true, number(//PROD_NUM), s, //WAGAMT]
            - [TMF.WAGAMT, TT_WAGER, //_UPDATE_MONEY=1, number(//PRODUCT), s, number(//_AMOUNT)]

        # Full path to file to write final statistics
        # default: null
        statfile: /tmp/statfile.xml
        
        # datconv writer module to write final statistics
        # All datconv compatible Writer modules can be used here.
        # The keys below are the same keys that you normaly have under Writer root key in YAML file.
        # This key must be set together with above statfile key.
        # If this key is null final statistics are being sent to configured logger as info messages.
        # default: null
        statwriter:
            Module: datconv.writers.dcxml
            CArg: 
                pretty: false