datconv.outconn.crate package

Module supporting storing data to Crate database.

General interface description: Output Connector interface.

Note

In this module all options lowercase and no_underscore are enabled by default, because Crate (versions 2.2 and 2.3) sometimes silently converts field names to lower case and does not load data (using COPY command) if fields begin with underscore.

datconv.outconn.crate.ddl module

This module implements Datconv Output Connector which generates CREATE TABLE cause in database crate.io dialect. Output of this connector should be treated as starting point for further manual edition, it is not recommended to use it for automatic table generation. This connector should be used with Writer: datconv.writers.dcxpaths module with options add_header: false and add_type: true.

class datconv.outconn.crate.ddl.DCConnector(table, path, mode='w', schema='doc', check_keywords=True, lowercase=1, no_underscore=1, column_constraints={}, common_column_constraints=[], table_constraints=[])[source]

Bases: object

Please see constructor description for more details.

Parameters are usually passed from YAML file as subkeys of OutConnector:CArg key.

Parameters:
  • table – name of the table.
  • path – relative or absolute path to output file.
  • mode – output file opening mode.
  • schema – schema of the table.
  • check_keywords – if true, prevents conflicts with SQL keywords. Data field names that are in conflict will be suffixed with undderscore.
  • lowercase – if >1, all JSON keys will be converted to lower-case; if =1, only first level keys; if =0, no conversion happen.
  • no_underscore – if >1, leading _ will be removed from all JSON keys; if =1, only from first level of keys; if =0, option is disabled.
  • column_constraints – dictionary: key=column name, value=column constraint.
  • common_column_constraints – column constatins to be added after column definitions. Should be declared as string.
  • table_constraints – table constatins and creation options. Should be declared as string.

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.outconn.crate.json module

This module implements Datconv Output Connector which saves data to file that contain one JSON object per line. Such files can serve as imput files for COPY FROM caluse used in crate.io database. This connector should be used with Writer: datconv.writers.dcjson

class datconv.outconn.crate.json.DCConnector(path, check_keywords=True, lowercase=1, no_underscore=1, move_to_front=[], cast=None)[source]

Bases: object

Please see constructor description for more details.

Parameters are usually passed from YAML file as subkeys of OutConnector:CArg key.

Parameters:
  • path – relative or absolute path to output file.
  • check_keywords – if true, prevents conflicts with SQL keywords.
  • lowercase – if >1, all JSON keys will be converted to lower-case; if =1, only first level keys; if =0, no conversion happen.
  • no_underscore – if >1, leading _ will be removed from all JSON keys; if =1, only from first level of keys; if =0, option is disabled.
  • move_to_front – those first level keys will be placed at begin of record. This option requires dcjson Writer option preserve_order to be set.
  • cast – array of arrays of the form: [[‘rec’, ‘value’], str], what means that record: {“rec”: {“value”: 5025}} will be writen as {“rec”: {“value”: “5025”}} - i.e. it is ensured that “value” will allways be string. First position determines address of data to be converted, last position specifies the type: str, bool, int, long or float. Field names shold be given after all other configured transformations (lowercase, no_underscore, check_keywords).

For more detailed descriptions see conf_template.yaml file in this module folder.

datconv.outconn.crate.insert module

This module implements Datconv Output Connector which directly inserts data to crate.io database. This connector should be used with Writer: datconv.writers.dcjson module called with option preserve_order: true. It requires Python package crate to be installed. TODO: Add suport for ON CONFLICT

class datconv.outconn.crate.insert.DCConnector(table, connstring, user=None, password=None, schema='doc', dump_sql=False, bulk_size=10000, check_keywords=True, lowercase=1, no_underscore=1, cast=None, on_conflict=None, options=[])[source]

Bases: object

Please see constructor description for more details.

Parameters are usually passed from YAML file as subkeys of OutConnector:CArg key.

Parameters:
  • table – table name where to insert records.
  • connstring – connection string to database.
  • user – user name for databse connection.
  • password – password for databse connection.
  • schema – table schema name where to insert records.
  • dump_sql – if true, insert statements are being saved to file specified as connstring and not inserted to database (option to be used for debugging).
  • bulk_size – if consequtive records have similar structure (i.e. have the same fields) - they are groupped into one pack (up to the size specified as this parameter) and inserted in one command. If set value is 0 than every insert is done individaually - warning: it is slow operation.
  • check_keywords – if true prevents conflicts with SQL keywords.
  • lowercase – if >1, all JSON keys will be converted to lower-case; if =1, only first level keys; if =0, no conversion happen.
  • no_underscore – if >1, leading _ will be removed from all JSON keys; if =1, only from first level of keys; if =0, option is disabled.
  • cast – array of arrays of the form: [[‘rec’, ‘value’], str], what means that record: {“rec”: {“value”: 5025}} will be writen as {“rec”: {“value”: “5025”}} - i.e. it is ensured that “value” will allways be string. First position determines address of data to be converted, last position specifies the type: str, bool, int, long or float. Field names shold be given after all other configured transformations (lowercase, no_underscore, check_keywords).
  • on_conflict – specify what to do when record with given primary key exist in the table; one of strings ‘update’ or None (raise error in such situation).
  • options – array or additional options added to INSERT caluse (see Crate documentation), it may be also ON DUPLICATE KEY phrase with non default settings.

For more detailed descriptions see conf_template.yaml file in this module folder.

Configuration keys

Listing of all possible configuration keys to be used with output connectors contained in this package.

There are sample values given, if key is not specified in configuration file, than default value is assumed.

OutConnector:
    Module: datconv.outconn.crate.ddl
    CArg:
      # name of the table; obligatory
      table: product
      
      # relative or absolute path to output file; obligatory
      path: "out/AddnDrawNbrs_c5019_s38_2.sql"
        
      # output file opening mode (w or a); optional
      # default: w
      mode: a
      
      # schema of the table
      # default: doc
      schema: doc
      
      # if true, prevents conflicts with SQL keywords;
      # data field names that are in conflict will be suffixed with undderscore.
      # default: true
      check_keywords: true
      
      # if >1, all JSON keys will be converted to lower-case; 
      # if =1, only first level keys; 
      # if =0, no conversion happen.
      # default: 1
      lowercase: 1
      
      # if >1, leading ``_`` will be removed from all JSON keys; 
      # if =1, only from first level of keys; 
      # if =0, option is disabled.
      # default: 1
      no_underscore: 1
      
      # dictionary: key=column name, value=column constraint; optional.
      # default: {}
      column_constraints:
        cdc: NOT NULL
      
      # column constatins to be added after column definitions.  Should be a list
      # default: []
      common_column_constraints:
        - PRIMARY KEY(cdc, isn)
      
      # table constatins and creation options.  Should be a list
      # default: []
      table_constraints:
        - PARTITION BY (cdc)

OutConnector:
    Module: datconv.outconn.crate.json
    CArg:
      # relative or absolute path to output file; obligatory
      path: "out/AddnDrawNbrs_c5019_s38_2.json"
      
      # if true, prevents conflicts with SQL keywords;
      # data field names that are in conflict will be suffixed with undderscore.
      # default: true
      check_keywords: true
      
      # if >1, all JSON keys will be converted to lower-case;
      # if =1, only first level keys; 
      # if =0, no conversion happen.
      # default: 1
      lowercase: 1
      
      # if >1, leading ``_`` will be removed from all JSON keys; 
      # if =1, only from first level of keys; 
      # if =0, optionj is disabled.
      # default: 1
      no_underscore: 1
      
      # list of first level keys that will be placed at begin of record. 
      # This option requires ``dcjson`` Writer option ``preserve_order`` to be set.
      # default: []
      move_to_front: ['cdc', 'recno']
      
      # array of arrays of the form: [['rec', 'value'], str], what means that record: {"rec": {"value": 5025}} 
      # will be writen as {"rec": {"value": "5025"}} - i.e. it is ensured that "value" will allways be string. 
      # First position determines address of data to be converted, last position specifies the type: str, bool, int, long or float. 
      # Field names shold be given after all other configured transformations (lowercase, no_underscore, check_keywords)
      # default: none
      cast:
          - [['rec', 'value'], str]

OutConnector:
    Module: datconv.outconn.crate.insert
    CArg:
      # name of the table; obligatory
      table: product
      
      # connection string to database; obligatory
      connstring: http://192.168.1.15:4200
      
      # user name for databse connection
      # default: none
      user: crate
      
      # connection string to database; obligatory
      # default: none
      password: none
      
      # schema of the table
      # default: doc
      schema: doc
      
      # if true, prevents conflicts with SQL keywords;
      # data field names that are in conflict will be suffixed with undderscore.
      # default: true
      check_keywords: true
      
      # if >1, all JSON keys will be converted to lower-case;
      # if =1, only first level keys; 
      # if =0, no conversion happen.
      # default: 1
      lowercase: 1
      
      # if >1, leading ``_`` will be removed from all JSON keys; 
      # if =1, only from first level of keys; 
      # if =0, optionj is disabled.
      # default: 1
      no_underscore: 1
      
      # if true, insert statements are being saved to file specified as connstring.
      # default: false
      dump_sql: true
      
      # if consequtive records have similar structure (i.e. have the same fields) 
      # they are groupped into one pack (up to the size specified as this parameter) and inserted in one command. 
      # If set value is 0 than every insert is done individaually - warning: it is slow operation.
      # default: 10000
      bulk_size: 5000
      
      # array of arrays of the form: [['rec', 'value'], str], what means that record: {"rec": {"value": 5025}} 
      # will be writen as {"rec": {"value": "5025"}} - i.e. it is ensured that "value" will allways be string. 
      # First position determines address of data to be converted, last position specifies the type: str, bool, int, long or float. 
      # Field names shold be given after all other configured transformations (lowercase, no_underscore, check_keywords)
      # default: none
      cast:
          - [['rec', 'value'], str]

      # specify what to do when record with given primary key exist in the table; 
      # one of strings 'update' or None (raise error in such situation).
      # default: none
      on_conflict: update
      
      # array or additional options added to INSERT caluse (see Crate documentation), 
      # it may be also ON DUPLICATE KEY phrase with non default settings.
      # default: []
      options: []