Back to Basics: Two-Dimensional Arrays

I had planned to write an article on ECMAScript Harmony and my ideas for corresponding XPIDL changes… but four months after its announcement, I find I still don’t know what’s in the Harmony spec! (We know a few things that are out… but since the initial blog surge, it’s been quiet.)

So I’ve decided to put my ideas for XPIDL on the side, and move on to the last part in this series: my attempt at creating a XPCOM component for two-dimensional arrays. Read on for further details.

Why build a component for this?

To be honest, I’m not sure why. I can’t point to any code in my code base that currently needs it. I can’t think of any requirements in Verbosio that say I must support a two-dimensional array. I just don’t need it right now.

I never did find any 2-D array structures in Mozilla code either, other than the JavaScript convention for an array of arrays. I suspect it’s for the same reason: no one needs it. The only similar example I know of is the nsIDOMSVGMatrix interface – and that’s numbers.

Even so, I could not imagine why we wouldn’t have a 2-D array class to pass between XPCOM components, as a base class. I don’t know why – it just felt like a missing piece. Despite the rational side telling me it’s a waste of time, I could not prevent myself from trying to build one.

Two ways to view the data

Typically, a 2-D array looks like a table:

0 1 2
0 A B C
1 D E F
2 G H I

In JavaScript, this might look like:

var data = [
["A", "B", "C"],
["D", "E", "F"],
["G", "H", "I"]
]

To me, though, there’s a second way to view this data:

Key  
x y Value
0 0 A
0 1 B
0 2 C
1 0 D
1 1 E
1 2 F
2 0 G
2 1 H
2 2 I

If we take the combined key as one column, and the values as another, our 2-D array can become an one-dimensional array, at least internally. We could also store this condensed array as a hashtable.

Truthfully, I did not know which way to go. In the hashtable, I could have fast look-up of a single cell’s value, but I would pay a price for grabbing a whole row or column. In an array of arrays design, I could end up wasting a lot of memory. There might also be a third design I’ve overlooked – some complex linked list structure, I suppose – but then I might really have to worry about memory management. No thanks.

I explored the hashtable option, and gave it up when I thought about getting whole rows and columns. I then decided to go the array of arrays approach.

The component and its interface

Before I wrote the “Data Matrix” component, I wrote the interfaces, clearly spelling out what I wanted to support: nsIDataMatrix. You can ignore the nsIVariantMatrix and nsIWritableVariantMatrix – those are holdovers from an earlier approach, and I will remove them soon.

I decided on two interfaces – nsIDataMatrix and nsIWritableDataMatrix – based on the separation between read-only interfaces in Mozilla and read-write interfaces. As I said before, I think it’s a really good idea, and useful in the right contexts.

I also decided to write the nsWritableDataMatrix component in C++ because I didn’t want to pay the XPConnect performance tax. Plus, there’s another advantage – for C++ programmers importing my nsWritableDataMatrix.h file, I can easily extend it to provide array indices. a[b][c] comes to mind.

As for JavaScript, I might be able to do something similar to DOM Class Info’s nsArraySH class. This, I think, is how we’re able to say elem.childNodes[4] instead of elem.childNodes.getItemAt(4). If I could do the same for JavaScript, then this component becomes a lot easier to work with.

Conclusion: More than two dimensions?

Forget it. With more than two dimensions, you could easily start running into out-of-memory problems. If I were to do a n-dimensional component structure, I’d probably skip the array-of-arrays-of-arrays-of… approach entirely. I might store the data two ways: one in a hashtable for individual cells, and another in a structure like this, for iterating along axes of the matrix:

class DataCell;
struct AxisCoordinate {
PRUint8 axisNumber;
PRUint8 indexOnAxis;
DataCell* nextCellOnAxis;
};
class DataCell {
private:
nsISupports* mData;
nsTArray<AxisCoordinate> coordinates;
};

It would waste a bit of memory, I’m sure, but it could still operate fairly fast. I don’t want to draw up the XPIDL interfaces I’d expose for a n-dimensional array, either.

I’m also sure someone’s come up with a more efficient algorithm than my code – yet another example of why I should probably take a computer science course one day. 🙂 As always, thanks for reading, and the comments section is open.

One thought on “Back to Basics: Two-Dimensional Arrays”

  1. The way I always do multidimensional arrays is to just do a very basic wrapper around a flat 1D array.
    For example (written in D—opCall is basically the constructor, and opIndex is the subscript operator overload):

    struct Float2D
    {
    float[] data;
    size_t w,h;
    Float2D opCall(size_t w, size_t h)
    {
    Float2D result;
    result.data = new float[](w*h);
    result.w = w;
    result.h = h;
    return result;
    }
    float opIndex(size_t i, size_t j)
    {
    return data[i+w*j];
    }
    }

    Used like so:

    auto array2d = Float2D(3,4);
    writefln("Value 2 across, 1 down: %f", array2d[2,1]);

    No idea how that’d be translated into Mozilla funny-buggers… 😛

Comments are closed.