MATLAB Answers

2

Dynamic variable names for full workspace operations

Asked by D. Plotnick on 1 Feb 2017
Latest activity Commented on by per isakson
on 27 Feb 2017
To start with, I understand dynamic variable names are bad. I am not really trying to use them. What I really want to do is apply a specific operation to all variables in the current workspace; this way I can generate a generic function to apply that operation.
Two examples: Example 1
Let's say that I have a code where I can use double or single precision depending on user choice. I want to cycle through all of the workspace variables looking for e.g. doubles that have numel>1000 and convert all of them to single. I can use who to get my workspace, and then a loop with isa and a boolean to find all the variables that match those criteria. What I want to do now is perform the operation varname = single(varname) to reassign those variables to the single-precision class while keeping the same name. Is there a way to do this other than using dynamic variable names?
Example 2
Lets say I ran into an "out-of-memory" error on the GPU because there is a bunch of junk left on there from other operations. I want to cycle through all gpuArray class variables and pull them down using varname = gather(varname), perform a reset(gpuDevice), and then possibly place them back on the gpu using varname = gpuArray(varname). Again, I understand that I could write a code that knows all of the variable names, the point here is to generate a generic code that can do the operation on all the correct workplace variables.
Again, if there is a totally obvious way of doing this that doesn't involve dynamic names, please let me know. Also, if there is something super bad about either of these concepts, I need to know that too.
Otherwise...how do you code something like this using dynamic variable names, since Matlab seems to make that kind of operation intentionally difficult.
Thanks for your help, -Dan

  4 Comments

Show 1 older comment
"...can use double or single precision depending on user choice. I want to cycle through all of the workspace variables"
"Is there a way to do this other than using dynamic variable names?"
How did these variables get into the workspace?
Did you type them all? Likely no. Which means they were imported somehow, or are defined in the functions themselves.
If they are imported using load or any file reading function then there is absolutely no reason why there needs to be a workspace full of variables: simply import into one array, cell, structure, table, etc and then your task is trivial (one line, no eval).
And if they are there because you are running a user's script then run for the hills screaming. So you sensibly have functions, and are not doing absurd things with load or assignin or the like. Then there is no way you cannot have total control over how those variables get into the function workspace (at generation or import). Most likely you could split the code into sub-functions, cells, loops, or whatever, which then allows easy points to check the values and adjust the data class as you require. Thus your entire question becomes moot.
Even though you have written "I understand dynamic variable names are bad" it seems you have not realized that you can fix the problem at its source, not by trying to patch it up later.
"totally obvious way of doing this that doesn't involve dynamic names"
Yeah, don't have lots of variables. Pretty simple really.
For re-classing variables from double to single, I will mention that this will have issues if any of the variables are shared copies of other variables. A loop that does varname = single(varname) will effectively unshare the variables which would have negative memory usage consequences. E.g., a simplistic example:
X = a 1GB double array
Y = X; % a shared data copy of X.
At this point you only have 1GB of data in memory, since both X and Y are sharing data. Now see what happens when you make each of them single class:
X = single(X): % X is unshared with Y and turned into single
Y = single(Y); % Y is turned into single
After the 1st statement, total data memory is 1.5GB. After the 2nd statement, total data memory is back down to 1GB. But X and Y are not shared copies of each other any more, so there is no memory sharing benefit as was the case in the beginning. Ideally, if one knows that X and Y are shared, you would like to do something like this instead to get the total data memory down to 0.5GB:
X = single(X);
Y = X;
But there are no official mechanisms for detecting variable sharing status, either at the m-file level or in a mex routine. The only way to detect sharing is to hack into the variables in a mex routine, and even then it can get messy very quickly if there are cell arrays, struct arrays, or classdef objects involved.
Bottom line is that if there is a significant amount of data sharing involved, re-classing variables serially will wipe that sharing out and have negative memory usage consequences.
I did not realize data sharing was handled in that way; that is extremely helpful to know. I had thought that kind of sharing only applied when the data was stored in a handle class.

Sign in to comment.

3 Answers

Answer by per isakson
on 3 Feb 2017
Edited by per isakson
on 8 Feb 2017
 Accepted Answer

There are good reasons to avoid eval (Here, I use eval as shorthand for eval, evalin and assignin), see
"Example 1" &nbsp I don't think there is a solution without eval. But after all, eval exists in several languages and that's for a reason - I assume.
Here is my attempt to answer "Example 1".
M1 = ones(2e4)+eps;
M2 = ones(1e4)+eps;
variables = reshape( whos('M*'), 1,[] );
for v = variables
convert( v.name, 'single' )
end
whos('M*')
prints
Name Size Bytes Class Attributes
M1 20000x20000 1600000000 single
M2 10000x10000 400000000 single
where
function convert( variable_name, new_class )
% convert variable, variable_name, to type, new_class, in the workspace of the caller
%
% assert that the value of variable_name is the name of a variable in the caller
xpr = sprintf( 'exist( ''%s'', ''var'' );', variable_name );
num = evalin( 'caller', xpr );
%
if num == 1
% str = sprintf( '%1$s = cast( %1$s, ''%2$s'' );', variable_name, new_class );
% sts = evalin( 'caller', str );
% Error: The expression to the left of the equals sign
% is not a valid target for an assignment.
xpr = sprintf( 'cast( %s, ''%s'' );', variable_name, new_class );
try
assignin( 'caller', variable_name, evalin( 'caller', xpr ) );
catch me
fprintf( 2, 'Error: ''%s''\n', me.message );
end
else
fprintf( 2, 'Undefined variable, ''%s''\n', variable_name );
end
end
Stephen Cobeldick presents the following list of problems related to eval. I argue that my above use of eval avoids most of these problems.
  • Slow &nbsp the conversion in the above code is as fast as &nbsp M1=cast(M1,'single'); M2=cast(M2,'single');
  • Buggy &nbsp No, not in this case. convert does one thing and it's possible to test it thoroughly.
  • Security Risk &nbsp Not in this case. All necessary tests may be done in convert.
  • Difficult to Work With &nbsp The use of convert should not cause any problems.
  • Obfuscated Code Intent &nbsp convert communicates the intent well enough.
  • Confuses Data with Code &nbsp Not applicable in this case.
  • Code Helper Tools do not Work &nbsp That's true in this case, but F1 works with convert.
&nbsp
ADDENDUM, 2017-02-08
An improved version of convert inspired by the comments by Jan Simon
function convert( variable_name, new_type )
% convert variable, variable_name, to type, new_type, in the workspace of the caller
narginchk( 2, 2 )
assert( isa( variable_name, 'char' ), 'convert:IllegalClass'...
, '"%s" is not a character array', value2short(variable_name) )
assert( isrow( variable_name ), 'convert:IllegalSize' ...
, '"%s" is not a row', value2short(variable_name) )
assert( isvarname( variable_name ), 'convert:IllegalName' ...
, '"%s" is not a valid variable name', variable_name )
assert( isa( new_type, 'char' ), 'convert:IllegalClass' ...
, 'The type of new_type, %s, is not a char', value2short(new_type) )
assert( isrow( new_type ), 'convert:IllegalSize' ...
, 'The value of new_type, %s, is not a row', value2short(new_type) )
type_list = {'int8','uint8','int16','uint16','int32','uint32' ...
,'int64','uint64','double','single','logical','char'};
assert( any(strcmp( new_type, type_list )), 'convert:IllegalType' ...
, 'The value of new_type, %s, is not a valid type name', new_type )
% assert that the value of variable_name is the name of a variable in the caller
xpr = sprintf( 'exist(''%s'', ''var'' );', variable_name );
assert( evalin('caller',xpr) == 1, 'convert:UndefinedVariable' ...
, '"%s" is not a defined variable', variable_name )
cmd = sprintf( 'builtin( ''cast'', %s, ''%s'' );', variable_name, new_type );
try
assignin( 'caller', variable_name, evalin( 'caller', cmd ) );
catch me
fprintf( 2, 'Error: "%s"\n', me.message );
end
end
where
function str = value2short( val )
% value2short converts value to a short string that is suitable to display
%
% See also: mat2str
%
if nargin > 0
str = workspacefunc( 'getshortvalue', val );
max_len = 48;
if length( str ) >= max_len
str = [ str(1:max_len-4 ), ' ...' ];
end
else
str = 'NIL';
end
end

  12 Comments

@Stephen Cobeldick, you owe me an elaboration of "If this were my task I would write it as a nested function: simple, intuitive, easy." I would definitely chose your solution over my function, convert. However, I fail to understand how to implement your solution.
@per isackson: there is no trick. It was simply me stating what I would do if I was writing a function where at some unknown point during the calculation I needed to change the class of some variables. Here are a few starting assumptions:
  1. The variables are known. For me this is a perfectly reasonable assumption as I never have unknown variables in my workspace (never use load directly into the workspace, avoid assignin, eval, or other dynamic variable names).
  2. There are only a few variables. Again for me quite reasonable, because I do not fill my workspace with thousands of variables: that is what arrays are for.
  3. The variables are accessible to the "change" function.
I do not claim that this will change many arbitrary, not previously specified variables, because I never have unknown variables in my workspace anyway (as we all know, that path leads to JIT problems, obfuscation, and hard to fix bugs). It does not happen in my code, therefore I do not need to solve that problem. I prefer to solve tasks through good design, rather than trying to patch them up later (and hence this nested function).
So in the end my code would have (by design) no unknown variables, and if there were more than a few values, have them stored in some array, giving:
function out = test(N) % try around 12
%
Z = 0;
for k = 1:N
work()
end
%
function work()
Z = Z+1; % my work
% change can be triggered anywhere:
if rand()>0.8
change()
end
end
%
function change()
if Z>10 % condition
Z = single(Z);
end
end
%
out = class(Z);
end
Note that change can be called by any other nested or local functions, callbacks, timers, listeners, etc., at any point during the calculation.
I do not claim that this answers the original question of "cycle through all of the workspace variables": for the reasons I have given that problem would never occur in my code, allowing me to use this simple nested function to simply resolve the task of converting at any arbitrary moment during calculations involving my known variables.
Rather than trying to sledgehammer my way through my workspace, instead I asked myself: what am I trying to achieve, and found an elegant solution for that.
@Stephen Cobeldick, Thank you for your answer. I agree fully regarding "good design" and "no unknown variables".
I assumed as a premise that OP had painted himself into a corner. After reading the question more carefully I realize that OP posed the question out of curiosity.

Sign in to comment.


Answer by Edric Ellis
on 2 Feb 2017

For the gpuArray case, you could simply use save and load, i.e.
tempFile = tempname();
save(tempFile);
reset(gpuDevice);
load(tempFile);
delete(tempFile);

  2 Comments

Hmmm, I hadn't considered that. The problem is file sizes and transfer speeds. Several of the variables alone are > 1GB. Pulling them on and off the graphics card is relatively fast. Dropping them onto the hard disk has two major drawbacks (1) this is a much slower operation even with an SSD and (2) due to the size of the resulting file I would need to use the '-v7.3' flag on 'save', which always makes things absolutely chug. Essentially I want to pull all of the active Matlab variables from the GPU onto RAM, flush the GPU, then put the variables back up. Right now I just do it on a variable by variable basis within the code, which works but is also a pain to code and leads to a lot of time debugging to make sure I caught everything.
I did some poking around and thought I was getting somewhere but it didn't work. I was looking for a way to get at the workspace of the current function, with the idea that altering the workspace would be equivalent to altering the variable. I found that if you declare a nested function and use functions() that you get a workspace of the nested function that includes all variables in the parent assigned at the point you took the handle, which seemed like a doable way of getting access to your own workspace. Unfortunately changing the workspace did not change the variables in the function even for the shared variables. I was not able to get further on this.
It did leave me wondering if it would work for moving values in and out of the GPU array. If you have a shared variable that is assigned a gpu array and you gather it and send it again, then does that affect the original gpu array? The gather is going bring it back clearly, but the rewrite might instead create a second variable. I consider evalin('caller') to be a form of eval() though others might disagree I guess.

Sign in to comment.


Answer by Joss Knight
on 5 Feb 2017
Edited by Joss Knight
on 5 Feb 2017

Well, if you're really serious about a tool for managing storage of GPU arrays, then you need a new class. This would be a numeric handle type that forwards all its functions to the underlying type, and adds all new objects to a static list. All functions run in a try...catch statement to catch parallel:gpu:array:OOM and, if triggered it calls a static utility function to gather the contents of the list back to the host and try again.
The only difficulty here is that you need to provide an implementation of every single method you want your new type to implement, i.e. every method of gpuArray (and a few more that aren't methods of gpuArray but are functions that can take gpuArray inputs). But that code could be autogenerated fairly easily.

  2 Comments

I think I understand how to do what you are saying, and it may someday be worth it. However, for now it looks like considerably more work than taking the time to check my memory footprint during code prototyping, and just coding in a hard limit on the size of the variables I am using. Still, a really interesting idea that I wouldn't have thought of.
However I was wondering about your last comment about autogenerating code that includes any called methods; I did not know that was possible, do you have a link to any tutorial?
As always, thanks Joss.
It's just a boiler-plate method for any function, so, say, for plus:
function varargout = plus(varargin)
% This bit swaps out the custom-type arguments for
% their underlying gpuArray property
for i = 1:numel(varargin)
if (isa(varargin{i}, 'MyManagedGPUArrayType')
varargin{i} = varargin{i}.UnderlyingArrayProperty;
end
end
% Try at least twice
for i = 1:2
try
[varargout{1:nargout}] = plus(varargin{:});
catch me
if i == 2 || me.identifier ~= "parallel:gpu:array:OOM"
rethrow(me);
else
MyManagedGPUArrayType.doSomeGatheringToClawBackMemory();
continue;
end
end
break;
end
end
So you create some script that reads a long list of function and creates a file with all these forwarding methods in, substituting in the name of the function. Well, no, you'd create a utility function for most of this call-gather-call structure and have a much simpler repeated boiler-plate for each method.

Sign in to comment.