Preserving node names in a digraph

Question

Michael am 28 Feb. 2018

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/385512-preserving-node-names-in-a-digraph

Kommentiert: Christine Tobler am 5 Mär. 2018

I am constructing a large digraph with between 10k-100k nodes, in which I want to add, delete, and merge nodes. The nodes represent objects with other externally-stored data, which are indexed numerically, so the nodeIDs must be preserved to reference properly to the related data.

Is there a way of preserving node ids in a graph, other than giving the nodes string names?

In the following code

from_node=[1 1 2 3 4 4 5 6 7 3];
to_node=  [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
g=digraph(to_node, from_node, weights);
h=rmnode(g,2);

when you remove node 2, it will reorder the nodes and call some other node 2 unless you specify node names, which must be strings, as such:

from_node=[1 1 2 3 4 4 5 6 7 3];
to_node=  [3 2 5 7 6 5 7 7 2 4];
weights=rand(size(from_node));
names = cellstr(string(1:7));
g=digraph(to_node, from_node, weights,names);
h=rmnode(g,findnode(g,num2str(2)));

This is fine for small graphs, but for very large graphs that must be modified, this is extremely memory-inefficient, since you are forced to store a giant table of strings, which is redundant to your node id names.

Moreover, in this case you will need to do a findnode search each time that involves converting the number to a string, which could also be costly if done many many times.

Therefore, I am wondering if there is a more efficient way of preserving node ids upon insertion/deletion than using the names?

Thanks!!

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Walter Roberson am 1 Mär. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/385512-preserving-node-names-in-a-digraph#answer_307786

Convert the node numbers to base 2^16. char() the result. Use those as the strings. For node names that are no larger than 100k then this takes two characters (4 bytes) each (plus any overhead from cell arrays.)

2 Kommentare
Keine anzeigenKeine ausblenden

Michael am 1 Mär. 2018

How would you suggest doing this efficiently? dec2base only goes up to base 36 and I don't want to impose a strong computational load on this if I have to convert thousands of indices at once.

Thanks!

Walter Roberson am 1 Mär. 2018

In MATLAB Online öffnen

Labels = char(reshape(typecast(uint32(Indices),'uint16').',2,[]).');

Melden Sie sich an, um zu kommentieren.

Answer 2

Christine Tobler am 1 Mär. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/385512-preserving-node-names-in-a-digraph#answer_307936

Bearbeitet: Christine Tobler am 1 Mär. 2018

In MATLAB Online öffnen

Unfortunately, there is no direct way of doing this. The graph and digraph classes are designed to be fast when working on an existing graph, but this came at the cost of being relatively slow when adding and removing nodes one at a time.

To avoid having to convert the numbers to strings, you could construct and maintain two vectors which convert from the external indices to graph indices. For example like this:

maxExtInd = 1e6;
s = [1234 6543 765];
t = [6543 765 1234];
% graph2ext(indexIntoGraph) returns externalIndex
graph2ext = unique([s(:); t(:)]);
% ext2graph(externalIndex) returns indexIntoGraph 
%   (or zero if externalIndex is not in the graph)
ext2graph = sparse(maxExtInd, 1);
ext2graph(graph2ext) = 1:numel(graph2ext);
% Construct the graph:
g = graph(full(ext2graph(s)), full(ext2graph(t)));
graph2ext(g.Edges.EndNodes)
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Add a node:
newNode = 456;
assert(ext2graph(newNode) == 0); % Check the node ID is not already in the graph
g = addnode(g, 1);
graph2ext(end+1) = newNode;
ext2graph(newNode) = numnodes(g);
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]
% Remove a node:
nodeToRemove = 1234;
graphNodeToRemove = ext2graph(nodeToRemove);
g = rmnode(g, graphNodeToRemove);
graph2ext(graphNodeToRemove) = [];
ext2graph(nodeToRemove) = 0;
ext2graph(ext2graph > graphNodeToRemove) = ext2graph(ext2graph > graphNodeToRemove) - 1;
figure;
plot(g, 'NodeLabel', graph2ext);
conversionTable = [find(ext2graph(:)), nonzeros(ext2graph)]

2 Kommentare
Keine anzeigenKeine ausblenden

Michael am 2 Mär. 2018

In MATLAB Online öffnen

Thank you! I think this is a very nice solution. I'm wondering your thoughts about the trade-off between speed and memory in this particular situation.

In this case, we have to maintain the graph plus a 1xnum_nodes 8-byte double and 1xnum_nodes sparse filled with 8-byte doubles versus a 1xnum_nodes table column of 4-byte 1x2 char arrays. For 1000 entries, factoring in the overhead of the table object, I think it's a ~6.5 x memory savings to keep the table char array. However, we don't have to mess with conversions.

Do you think the search over the sparse/double array will be faster than doing the find_node of the proper node name?

I was able to improve on the speed of previous suggestion for conversion, assuming fewer than 2^32 entries using

char([floor(num./65536) rem(num,65536)])

to convert and

sum(double(cell2mat(nodenames)).*[65536 1],2)

to reverse, but there is certainly overhead using the findnode() functions within the digraph object and conversion to cell arrays of chars needed to use the digraph object.

You mentioned that it is optimized to be fast for operations but slow for manipulation and I see this to be true. When testing out my code, the biggest overhead is in adding an edge which calls expandTable(), which is very costly.

What is it about the table object that makes it optimal to design the graph object using it rather than just defining the nodes as a sparse and the edges as either a binary sparse or double sparse in the case of a weighted digraph? I'm very interested in what data structures are best for what jobs.

Thanks so much!

Christine Tobler am 5 Mär. 2018

Hi Michael,

With the table char array, you should factor in not only the cost for each 4-byte char array, but also the additional mxArray header (which for each element of a cell array, specifies its datatype and additional information). Also, the sparse array indexing will do a binary search, which the graph object's findnode is not (currently) doing on its node names.

You're right about expandTable being the main overhead - if there are no node and edges properties (that is, if you store node names and edge weights separately during the loop), this overhead should decrease drastically.

The table object is not used to represent the structure of the graph object internally, we are using it only for the node and edge properties. There, it has the advantage of allowing the storage of properties of arbitrary datatypes in a simple manner. For cases where the graphs are modified many times, there is unfortunately a large overhead associated with the nodes and edges tables.

Melden Sie sich an, um zu kommentieren.

Preserving node names in a digraph

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

2 Kommentare
Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Preserving node names in a digraph

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

2 Kommentare Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden