Most efficient way of looking up values in an array.

9 Ansichten (letzte 30 Tage)
Johann Vormadal
Johann Vormadal am 17 Mär. 2023
Kommentiert: Johann Vormadal am 20 Mär. 2023
I have an array (a kind of lookup array that I have generated) with 8 columns and around 400 rows. In every row the first four values represent the boundaries (y1, x2, y2 and x1) of rectangular regions in a plane. The last four values are integers I want to access if a query point is within the region described by the first four values. The rectangular regions fully cover the plane upto certain limits and do not overlap, but they are not regular/structured.
If I have, say, 10million points (represented by and x vector and a y vector) and I want to find for each of those the four integers (in column 5:8) that correspond to their region, What is the fastest way of looking this up? changing the way my lookup array represents the data is not off-limits, if it helps making the process faster.
I have tried the following, which is slow, seems very inefficient and breaks when the points are exactly on a boundary between regions:
QueryPoints = [X Y]; % My 10000000 x 2 vector for query points.
n = length(X); % Determine number of query points
SearchIndex = all( [Y -X -Y X] >= permute([IndexList(:,1) -IndexList(:,2:3) IndexList(:,4)], [3 2 1]),2);
myIntegers = zeros(n,4);
for i = 1:n
myIntegers(i,:) = IndexList(SearchIndex(i,1,:),5:8); % my 10000000 x 4 vector with the set of integers corresponding with my coordinates
end
Thanks.
  4 Kommentare
Bruno Luong
Bruno Luong am 17 Mär. 2023
Bearbeitet: Bruno Luong am 17 Mär. 2023
IMO using autoexpension might be not a good idea, since the intermediarete result (before all) requires
(10e6*4*8*452)/1e9
ans = 144.6400
144 Gb, so your computer might swap to HD and it works relly slow in thid condition.
It might be just loop on your rectangles (452 iterrations) then check which 10e6 point falling inside. The amount of memory requires is 0.33 Gb and it does not need to swap.
You can even split the 4 tests in succesive tests so that the later test work on a smaller set.
Johann Vormadal
Johann Vormadal am 20 Mär. 2023
Yes, the memory requirement was one of the issues I was having, that I didn't mention.
This seems to be a good way around that issue.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

John D'Errico
John D'Errico am 17 Mär. 2023
Bearbeitet: John D'Errico am 17 Mär. 2023
It hurts my head to see this ordering: (y1, x2, y2 and x1). Sigh. Why? Whatever floats your boat, I guess.
I would not be using a loop. Not even close. Not even remotely so. Nor would I be testing every rect, as your code does. Instead, I would be looking for a way to solve this using existing efficient code designed to solve that problem.
IF the rectangular regions fully tile a planar region, but do not overlap, I would first break each rect down into TWO triangles. The result will be a triangulation of that region. So I might now create a triangulation from those triangles. Don't use delaunay. (I don't know that your problem is convex anyway.) You already know the rects. And that means you already can compute the triangulation directly.
help triangulation
triangulation Triangulation in 2-D or 3-D triangulation supports topological and geometric queries for triangulations in 2D and 3D space. For example, you can query triangles attached to a vertex, triangle neighbor information, circumcenters, etc. You can create a triangulation directly using existing triangulation data in matrix format. Alternatively, you can create a delaunayTriangulation, which provides access to the triangulation functions. TR = triangulation(TRI, X, Y) creates a 2D triangulation from the triangulation connectivity matrix TRI and the points (X, Y). TRI is an m-by-3 matrix where m is the number of triangles. Each row of TRI is a triangle defined by vertex IDs - the row numbers of the points (X, Y). The point coordinates (X, Y) are column vectors representing the points in the triangulation. TR = triangulation(TRI, X, Y, Z) creates a 3D triangulation from the triangulation connectivity matrix TRI and the points (X, Y, Z). TRI is an m-by-3 or m-by-4 matrix that represents m triangles or tetrahedra that have 3 or 4 vertices respectively. Each row of TRI is a triangle or tetrahedron defined by vertex IDs - the row numbers of the points (X, Y, Z). The point coordinates (X, Y, Z) are column vectors representing the points in the triangulation. TR = triangulation(TRI, P) creates a triangulation from the triangulation connectivity matrix TRI and the points P. TRI is an m-by-3 or m-by-4 matrix that represents m triangles or tetrahedra that have 3 or 4 vertices respectively. Each row of TRI is a triangle or tetrahedron defined by vertex IDs - the row numbers of the points P. P is a mpts-by-ndim matrix where mpts is the number of points and ndim is the number of dimensions, 2 <= ndim <= 3. Example 1: % Load a 2D triangulation in matrix format and use the triangulation % to query the free boundary edges. load trimesh2d % This loads triangulation tri and vertex coordinates x, y trep = triangulation(tri, x,y); fe = freeBoundary(trep)'; triplot(trep); % Add the free edges in red hold on; plot(x(fe), y(fe), 'r','LineWidth',2); hold off; axis([-50 350 -50 350]); axis equal; Example 2: % Load a 3D tetrahedral triangulation in matrix format and use the % triangulation to compute the free boundary. load tetmesh % This loads triangulation tet and vertex coordinates X trep = triangulation(tet, X); [tri, Xb] = freeBoundary(trep); %Plot the surface mesh trisurf(tri, Xb(:,1), Xb(:,2), Xb(:,3), 'FaceColor', 'cyan', 'FaceAlpha', 0.8); Example 3: % Direct query of a 3D Delaunay triangulation created using % delaunayTriangulation. Compute the free boundary as in Example 2 Points = rand(50,3); dt = delaunayTriangulation(Points); [tri, Xb] = freeBoundary(dt); %Plot the surface mesh trisurf(tri, Xb(:,1), Xb(:,2), Xb(:,3), 'FaceColor', 'cyan','FaceAlpha', 0.8); triangulation methods: barycentricToCartesian - Converts the coordinates of a point from Barycentric to Cartesian cartesianToBarycentric - Converts the coordinates of a point from Cartesian to Barycentric circumcenter - Circumcenter of triangle or tetrahedron edgeAttachments - Triangles or tetrahedra attached to an edge edges - Triangulation edges faceNormal - Triangulation face normal featureEdges - Triangulation sharp edges freeBoundary - Triangulation facets referenced by only one triangle or tetrahedron incenter - Incenter of triangle or tetrahedron isConnected - Test if a pair of vertices is connected by an edge neighbors - Neighbors to a triangle or tetrahedron vertexAttachments - Triangles or tetrahedra attached to a vertex vertexNormal - Triangulation vertex normal size - Size of the triangulation ConnectivityList nearestNeighbor - Vertex closest to a specified point pointLocation - Triangle or tetrahedron containing a specified point triangulation properties: Points - The coordinates of the points in the triangulation ConnectivityList - The triangulation connectivity list See also delaunayTriangulation. Documentation for triangulation doc triangulation Other uses of triangulation polyshape/triangulation
methods triangulation
Methods for class triangulation: barycentricToCartesian edgeAttachments featureEdges isConnected pointLocation vertexAttachments cartesianToBarycentric edges freeBoundary nearestNeighbor size vertexNormal circumcenter faceNormal incenter neighbors triangulation
As you can see, triangulation has a pointLocation method. I might be using that.
help triangulation/pointLocation
pointLocation Triangle or tetrahedron containing specified point TI = pointLocation(T, QP) returns the index of the triangle/tetrahedron enclosing the query point QP. The matrix QP contains the coordinates of the query points. QP is a mpts-by-ndim matrix where mpts is the number of query points and 2 <= ndim <= 3. TI is a column vector of triangle or tetrahedron IDs corresponding to the row numbers of the triangulation connectivity matrix T.ConnectivityList. The triangle/tetrahedron enclosing the point QP(k,:) is TI(k). Returns NaN for points not located in a triangle or tetrahedron of T. TI = pointLocation(T, QX,QY) and TI = pointLocation (T, QX,QY,QZ) allow the query points to be specified in alternative column vector format when working in 2D and 3D. [TI, BC] = pointLocation(T,...) returns, in addition, the Barycentric coordinates BC. BC is a mpts-by-ndim matrix, each row BC(i,:) represents the Barycentric coordinates of QP(i,:) with respect to the enclosing TI(i). Example 1: % Point Location in 2D T = triangulation([1 2 4; 1 4 3; 2 4 5],[0 0; 2 0; 0 1; 1 1; 2 1]); % Find the triangle that contains the following query point QP = [1, 0.5]; triplot(T,'-b*'), hold on plot(QP(:,1),QP(:,2),'ro') % The query point QP is located in a triangle with index TI = 1 TI = pointLocation(T, QP) Example 2: % Point Location in 3D plus barycentric coordinate evaluation % for a Delaunay triangulation x = rand(10,1); y = rand(10,1); z = rand(10,1); DT = delaunayTriangulation(x,y,z); % Find the tetrahedra that contain the following query points QP = [0.25 0.25 0.25; 0.5 0.5 0.5] [TI, BC] = pointLocation(DT, QP) See also triangulation, triangulation.nearestNeighbor, delaunayTriangulation. Documentation for triangulation/pointLocation doc triangulation/pointLocation
Before I did too much, I might look to see if other tools, for example, tsearch might be faster.
For example:
n = 21;
[X,Y] = ndgrid(linspace(0,1,n)); % 400 rects
XYgrid = [X(:),Y(:)];
[ri,rj] = ndgrid(1:n-1);
rind = sub2ind([n,n],ri(:),rj(:));
tri = [[rind,rind+1,rind+n];[rind+n+1,rind+1,rind+n]];
triang = triangulation(tri,X(:),Y(:));
rectid = [rind;rind]; % maps each triangle into the original rect it came from
% now, generate 1e7 points in that region
xytest = rand(1e7,2);
tic
triloc = pointLocation(triang,xytest);
rectloc = rectid(triloc);
toc
Elapsed time is 7.104490 seconds.
7 seconds in the online tool, (my computer is always faster when running my own copy of MATLAB. That was only 1.5 seconds in my own copy of MATLAB.) seems reasonable to locate 1e7 points. In fact of course, I could do better knowing the actual layout of this grid. But you tell us that your rects are not so trivially arranged as I have in this example. And pointLocation should not care.
  4 Kommentare
John D'Errico
John D'Errico am 19 Mär. 2023
@Bruno Luong - sorry, but you are completely wrong in this. I showed how to use a triangulation, and the function pointLocation. Yes, I did suggest that tsearch MIGHT be an optional alternative. But I explaicity stated to use pointLocation as a working option. In fact, pointLocation does explicitly NOT need a convex triangulation.
Seriously Bruno, TRY IT if you don't know the answer.
PS = polyshape([0 0;2 0;2 1;1 1;1 2;0 2])
PS =
polyshape with properties: Vertices: [6×2 double] NumRegions: 1 NumHoles: 0
plot(PS)
PStri = triangulation(PS)
PStri =
triangulation with properties: Points: [6×2 double] ConnectivityList: [4×3 double]
PStri is a triangulation object, but not a convex one.
trimesh(PStri.ConnectivityList,PStri.Points(:,1),PStri.Points(:,2))
PStri can be used in by pointLocation.
pointLocation(PStri,2,2)
ans = NaN
pointLocation(PStri,.5,.5)
ans = 3
Convexity is not a requirement for pointLocation.
Even so, IF the domain is always a convex one, then tsearch MIGHT be faster. I have not compared the older code to see how it competes.
Bruno Luong
Bruno Luong am 19 Mär. 2023
Bearbeitet: Bruno Luong am 19 Mär. 2023
Delaunay doesn't not have to be convex If I remember one of the definntion is that that circumscrbed circle on any triangle does not contain any vertices other than the three of the triangle.
It can also derived as the geometrical "dual" of voronoi.
I believe various MATLAB tsearch uses some special property of Delaunay to make the search efficient and it breaks downs if it is not.
I remember in the pass trying to do interpolation and provide my own triangulation (for example spliiting some arbitray quadrilateral mesh coming from somewhere else) and calling closestpoint and I fail to do (or may be the process takes too much time, I can't recall the detail why I drop it).
May be I should revisit again.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

the cyclist
the cyclist am 17 Mär. 2023
Bearbeitet: the cyclist am 17 Mär. 2023
One simple step that could speed up your code a lot would be to preallocate memory for your output array. Put
myIntegers = zeros(n,4);
before your for loop.
You can read about why preallocation is important here.
  1 Kommentar
Johann Vormadal
Johann Vormadal am 17 Mär. 2023
Bearbeitet: Johann Vormadal am 17 Mär. 2023
Sorry, my Code snippet was missing that line. I am currently pre-allocating. I have editted the post to reflect this.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Delaunay Triangulation finden Sie in Help Center und File Exchange

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by