Tutorial, Samples¶
Note
Below examples are suited for Linux system. In Windows System:
- use Windows file paths
- replace main program name
datconv
withdatconv-run.py
- if you use Notepad++ editor, use
.yml
configuration file extension rather than.yaml
, this ensures that syntax hightlighting will be working out of box.
Minimal XML to JSON conversion example¶
Let say we have following XML file that we want to convert to JSON format file.xml
:
<Data>
<Hello>World</Hello>
</Data>
At first we have to write simple configuration file conv.yaml
:
Reader:
Module: datconv.readers.dcxml
CArg:
bratags: [Data]
PArg:
inpath: ./file.xml
outpath: ./file.json
Writer:
Module: datconv.writers.dcjson
then, run Datconv tool against this file:
datconv ./conv.yaml
or skip PArg:
key in the configuration file and pass file names in command line:
datconv ./conv.yaml --Reader:PArg:inpath:./file.xml --Reader:PArg:outpath:./file.json
Following file.json
file will be created:
[
{"Data": {}},
{"Hello": "World"}
]
If you don’t want Data
object to be in output file, additional parameter must be added to
writer’s arguments in configuration file:
Writer:
Module: datconv.writers.dcjson
CArg:
add_header: false
In case your input file would look like this:
<Data>
<Hello World="!" />
</Data>
you typically want to add another argument in configuration file:
Writer:
Module: datconv.writers.dcjson
CArg:
add_header: false
with_prop: true
to obtain output:
[
{"Hello": {"World": "!"}}
]
or with yet another option:
Writer:
Module: datconv.writers.dcjson
CArg:
add_header: false
with_prop: true
json_opt:
indent: 2
[
{
"Hello": {
"World": "!"
}
}
]
JSON to XML conversion example¶
Let say we have JSON query result returned by cbq
tool from Couchbase saved in file cb.json
that we want to convert to XML:
{
"requestID": "f5a71946-275a-45ef-a13e-f2a335b9b84b",
"signature": {
"name": "json",
"phone": "json"
},
"results": [
{
"name": "Hilton Chambers",
"phone": "+44 161 236-4414"
},
{
"name": "Sachas Hotel",
"phone": null
},
{
"name": "The Mitre Hotel",
"phone": "+44 161 834-4128"
}
],
"status": "success",
"metrics": {
"elapsedTime": "9.516157ms",
"executionTime": "9.488693ms",
"resultCount": 3,
"resultSize": 253,
"sortCount": 3
}
}
configuration file would look like this:
Reader:
Module: datconv.readers.dcijson_keys
CArg:
headkeys: [requestID, signature]
reckeys: [results]
footkeys: [status, metrics]
PArg:
inpath: ./cb.json
outpath: ./cb.json.xml
Writer:
Module: datconv.writers.dcxml
CArg:
pretty: true
and output file cb.json.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<Datconv>
<requestID val="f5a71946-275a-45ef-a13e-f2a335b9b84b"/>
<signature phone="json" name="json"/>
<results>
<name>Hilton Chambers</name>
<phone>+44 161 236-4414</phone>
</results>
<results>
<name>Sachas Hotel</name>
<phone>None</phone>
</results>
<results>
<name>The Mitre Hotel</name>
<phone>+44 161 834-4128</phone>
</results>
<status val="success"/>
<metrics sortCount="3" executionTime="9.488693ms" elapsedTime="9.516157ms" resultCount="3" resultSize="253"/>
</Datconv>
XML to CSV conversion example¶
Let say we want to convert output XML file from above example to CSV.
Configuration file:
Reader:
Module: datconv.readers.dcxml
CArg:
rectags: [results]
PArg:
inpath: ./cb.json.xml
outpath: ./cb.xml.csv
Writer:
Module: datconv.writers.dccsv
CArg:
columns:
- ['name','*','name',null]
- ['phone','*','phone',null]
and output file:
name,phone
Hilton Chambers,+44 161 236-4414
Sachas Hotel,None
The Mitre Hotel,+44 161 834-4128
Using filter¶
If we want to somehow change the data on the fly during conversion we can use the filter.
There are few filters shipped with datconv
package, see: datconv.filters package.
But usually you need to write your own custom filter. For instance imagine that in above described conversion
we want to skip records that do not have phone number. We should write folliwing filter:
# Standard Python Libs
import logging
# Libs installed using pip
from lxml import etree
# Datconv generic modules
from datconv.filters import SKIP, WRITE, REPEAT, BREAK
Log = None
class DCFilter:
def filterRecord(self, record):
tag = record.find('.//phone')
if tag is not None and tag.text != 'None':
return WRITE
else:
return SKIP
and save it as file with_phone.py
in folder custom
created where we run datconv
program.
In addtion we have to create empty file __init__.py
in this folder (to make it valid Python package) and add
following key to conversion configuration file:
Filter:
Module: custom.with_phone
Then when you run conversion, you will get expected result:
name,phone
Hilton Chambers,+44 161 236-4414
The Mitre Hotel,+44 161 834-4128
Note that current folder is automatically added to Python search path by datconv
script.
Concatenating several filters¶
If we have a library of generic filters and woud like to use few of them in one data conversion run
it is possible with provided pipe
filter.
E.g. following configuration will use above filter and standard filter that will remove name
field from the output records:
Filter:
Module: datconv.filters.pipe
CArg:
flist:
- Module: custom.with_phone
- Module: datconv.filters.delfield
CArg:
field: [name]
Populating database¶
Let say we have output file from: JSON to XML conversion example (file cb.json.xml
) that we would like to import to SQLite database. At first we have to create table with respective
fields to store data. Connector datconv.outconn.sqlite.ddl module can be helpfull here. Let’s run datconv with following configuration:
Reader:
Module: datconv.readers.dcxml
CArg:
rectags: [results]
PArg:
inpath: ./sampl.xml
Writer:
Module: datconv.writers.dcxpaths
CArg:
add_header: false
add_type: true
ignore_rectyp: true
OutConnector:
Module: datconv.outconn.sqlite.ddl
CArg:
path: ./sampl.sql
table: sample
This will produce file sampl.sql
with proposed table definition:
CREATE TABLE sample (
name TEXT,
phone TEXT
);
Edit this definition (possibly define primary key etc.) and run with your database to create table. This step may look somewhat like art-for-art in this simple sample, but with large number of fields or tables may save you some typing or copy/pasting.
Then, change Writer and Connector in your configuration file in following way
(place path to your SQLite database file as connstring:
parameter):
Reader:
Module: datconv.readers.dcxml
CArg:
rectags: [results]
PArg:
inpath: ./sampl.xml
Writer:
Module: datconv.writers.dcjson
CArg:
ignore_rectyp: true
OutConnector:
Module: datconv.outconn.sqlite.jinsert
CArg:
connstring: ./sampl.sqlite
table: sample
and run datconv again to insert records into the table.
Note use of ignore_rectyp
option to get rid of outer <results>
tag which otherwise
would be interpretted as the only one column in the table. In case of problems with inserts
one can add dump_sql
option to Connector in order to dump generated INSERT statements to
file or change OutConnector to plain file to check generated JSON.
Default Configuration¶
It may happen that we have some typical files or typical conversion scenario that we frequently use. In such case it is reasonable to use Default Configuration file.
Let say we frequently do conversion described in section JSON to XML conversion example.
To simplify program usage we may save configuration file for this case as the file .datconv_def.yml
located in root of our home folder making one change in the file:
replacing file names with positional arguments, like below:
Reader:
Module: datconv.readers.dcijson_keys
CArg:
headkeys: [requestID, signature]
reckeys: [results]
footkeys: [status, metrics]
PArg:
inpath: $1
outpath: $2
Writer:
Module: datconv.writers.dcxml
CArg:
pretty: true
After that one may call datconv in following way:
datconv def <imput file> <output file>
to perform conversion according to saved schema.