score:16

Accepted answer

sorry for the late reply, we were busy building v2.8 ;) I would suggest to do as many things as possible on the database side, as copying and serializing/deserializing JSON over the network is typically expensive, so transferring as little data as possible should be a good aim.

First of all i have used your query and executed it on a sample dataset i created (~ 800 vertices and 800 edges are hit in my dataset) As a baseline i used the execution time of your query which in my case was ~5.0s

So i tried to create the exact same result as you need in AQL only. I have found some improvements in your query: 1. GRAPH_NEIGHBORS is a bit faster than GRAPH_EDGES. 2. If possible avoid {includeData: true} if you do not need the data Especially if you need to/from vertices._id only GRAPH_NEIGHBORS with {includeData: false} outperforms GRAPH_EDGES by an order of magnitude. 3. GRAPH_NEIGHBORS is deduplicated, GRAPH_EDGES is not. Which in your case seems to be desired. 3. You can get rid of a couple of subqueries there.

So here is the pure AQL query i could come up with:

LET docId = "ExampleDocClass/1234567"
LET edges = GRAPH_EDGES('EdgeClass',docId,{direction:'any',maxDepth:1,includeData:true})
LET verticesTmp = (FOR v IN GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'any', maxDepth: 1, includeData: true})
  RETURN {
    vertexData: v,
    outEdges: GRAPH_NEIGHBORS('EdgeClass', v, {direction: 'outbound', maxDepth: 1, includeData: false}),
    inEdges: GRAPH_NEIGHBORS('EdgeClass', v, {direction: 'inbound', maxDepth: 1, includeData: false})
  })
LET vertices = PUSH(verticesTmp, {
  vertexData: DOCUMENT(docId),
  outEdges: GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'outbound', maxDepth: 1, includeData: false}),
  inEdges: GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'inbound', maxDepth: 1, includeData: false})
})
RETURN { edges, vertices }

This yields the same result format as your query and has the advantage that every vertex connected to docId is stored exactly once in vertices. Also docId itself is stored exactly once in vertices. No deduplication required on client side. But, in outEdges / inEdges of each vertices all connected vertices are also exactly once, i do not know if you need to know if there are multiple edges between vertices in this list as well.

This query uses ~0.06s on my dataset.

However if you put some more effort into it you could also consider to use a hand-crafted traversal inside a Foxx application. This is a bit more complicated but might be faster in your case, as you do less subqueries. The code for this could look like the following:

var traversal = require("org/arangodb/graph/traversal");
var result = {
  edges: [],
  vertices: {}
}
var myVisitor = function (config, result, vertex, path, connected) {
  switch (path.edges.length) {
    case 0:
      if (! result.vertices.hasOwnProperty(vertex._id)) {
        // If we visit a vertex, we store it's data and prepare out/in
        result.vertices[vertex._id] = {
          vertexData: vertex,
          outEdges: [],
          inEdges: []
        };
      }

      // No further action
      break;
    case 1:
      if (! result.vertices.hasOwnProperty(vertex._id)) {
        // If we visit a vertex, we store it's data and prepare out/in
        result.vertices[vertex._id] = {
          vertexData: vertex,
          outEdges: [],
          inEdges: []
        };
      }
      // First Depth, we need EdgeData
      var e = path.edges[0];
      result.edges.push(e);
      // We fill from / to for both vertices
      result.vertices[e._from].outEdges.push(e._to);
      result.vertices[e._to].inEdges.push(e._from);
      break;
    case 2:
      // Second Depth, we do not need EdgeData
      var e = path.edges[1];
      // We fill from / to for all vertices that exist
      if (result.vertices.hasOwnProperty(e._from)) {
        result.vertices[e._from].outEdges.push(e._to);
      }
      if (result.vertices.hasOwnProperty(e._to)) {
        result.vertices[e._to].inEdges.push(e._from);
      }
      break;
  }
};
var config = {
  datasource: traversal.generalGraphDatasourceFactory("EdgeClass"),
  strategy: "depthfirst",
  order: "preorder",
  visitor: myVisitor,
  expander: traversal.anyExpander,
  minDepth: 0,
  maxDepth: 2
};
var traverser = new traversal.Traverser(config);
traverser.traverse(result, {_id: "ExampleDocClass/1234567"});
return {
  edges: result.edges,
  vertices: Object.keys(result.vertices).map(function (key) {
              return result.vertices[key];
            })
};

The idea of this traversal is to visit all vertices from the start vertex to up to two edges away. All vertices in 0 - 1 depth will be added with data into the vertices object. All edges originating from the start vertex will be added with data into the edges list. All vertices in depth 2 will only set the outEdges / inEdges in the result.

This has the advantage that, vertices is deduplicated. and outEdges/inEdges contain all connected vertices multiple times, if there are multiple edges between them.

This traversal executes on my dataset in ~0.025s so it is twice as fast as the AQL only solution.

hope this still helps ;)


Related Query