CSV Import

The CSV importer allows property graphs to be imported into the graph database. It requires nodes and edges to be specified in a CSV format.

Usage

The importer can be invoked via an HTTP GET request or via the utility ‘octopus-csvimport.sh’ in projects/octopus.

The HTTP GET request can be issued with curl as follows

curl http://localhost:2480/importcsv/<nodeFilename>/<edgeFilename><dbname>/

where nodeFilename is a CSV file containing nodes, edgeFilename is a CSV file containing edges, and dbname is the name of the database to import into.

Alternatively, the script ‘octopus-csvimport.sh’ can be invoked as follows

projects/octopus/octopus-csvimport.sh [dbname]

where dbname is the name of the database. The tool will automatically impor the files ‘nodes.csv’ and ‘edges.csv’ if present in the current working directory.

Configuration

none.

Input Format for Nodes

Nodes are described using a CSV file format, where the first line describes the row format (CSV header), and the remaining lines contain the actual nodes. The tabular character is used as a deliminator. Double-quotes can be used to enclose values of fields that contain newlines or tabs.

The CSV header has two mandatory fields: command, and key.

The command field specifies the action to perform for this node. The following commands are currently supported:

Name Description
ANR Add node, replacing any existing node with the same key.
A Add node, creating an alternative key if a node with this key already exists. The alternative key is generated by adding an underscore followed by a number to the key.

The key field contains an identifier for the node. The key can be an arbitrary string. The strategy to follow when a node with this key already exists depends on the command.

The remaining fields specify the names of node properties. As an example, please take a look at nodes.csv as generated by the bjoern-radare.sh tool.

Input Format for Edges

Edges between nodes are described in a CSV file, where nodes are referred to by their keys. The first line of the CSV file (CSV header) describes the row format.

The CSV header has three mandatory fields: sourcekey, destkey, and edgeType.

The sourcekey and destkey fields specify the key of the edge’s source and destination node respectively, while edgeType specifies the type of the edge as an arbitrary string.

The remaining fields are the names of edge properties. As an example, please take a look at edges.csb as generated by bjoern-radare.sh.