Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector and Matrix with compile-time known size #76

Closed
andreadelprete opened this issue Oct 3, 2013 · 11 comments
Closed

Vector and Matrix with compile-time known size #76

andreadelprete opened this issue Oct 3, 2013 · 11 comments
Assignees
Labels
Issue Type: Feat/Enh Req This issue requests some new feature or enhancement

Comments

@andreadelprete
Copy link

Taking inspiration from the linear algebra library Eigen (http://eigen.tuxfamily.org/dox/index.html) we could make Vector and Matrix template classes, with two template parameters:

  • _Rows: Number of rows, or Dynamic
  • _Cols: Number of columns, or Dynamic

This way, when the size is known at compile time, memory can be allocated statically. Moreover, there is no need to resize data, because the size is already specified in the data type.
Following the example of Eigen, if the user tries to resize a fixed-size vector/matrix to a size that is different from what specified initially, an assert(false) is executed.

@ghost ghost assigned andreadelprete Oct 3, 2013
@andreadelprete
Copy link
Author

I report here some notes about the design choices, so that if someone is interested can follow the progress of this development and can also contribute by giving suggestions.

DESIGN IDEA

My idea would be to add a new template class VectorN that inherits from Vector. This way, all the code that works with Vector can also work with VectorN.

ALTERNATIVE DESIGN (DISCARDED)

I also considered using VectorN as parent class, and making Vector inherits from VectorN<-1> (where the size -1 would indicate a nonconstant size, unknown at compile time). However, this design would not allow to reuse all the code that works with Vector, because VectorN would not be a son of Vector.

DRAWBACKS

The drawback of making VectorN inherit from Vector is that VectorN has an extra member variables that it doesn't use (i.e. VectorOf storage). However I modified the behavior of the default constructor of VectorOf so that it doesn't allocate memory. This way, the useless storage variable contains only these member variables:

  • yarp::os::ManagedBytes bytes;
  • double *first;
  • size_t len;

In turns, ManagedBytes contains these member variables:

  • Bytes b;
  • bool owned;
  • size_t use;
  • bool use_set;

Finally, Bytes contains:

  • char *data;
  • size_t len;

So in the end the extra memory taken by an empty VectorOf is:

  • 2 pointers (double *first; char *data;)
  • 3 unsigned int (size_t len; size_t use; size_t len)
  • 2 bool (bool owned; bool use_set;)

@paulfitz
Copy link
Member

paulfitz commented Oct 4, 2013

@andreadelprete this sounds like it will be a great addition. Your approach looks good to me. Thanks for taking the time to post your reasoning.

@andreadelprete
Copy link
Author

Thank You @paulfitz for reading what I wrote!

Another possible improvement to Vector is to avoid dynamic memory allocation here:

void Vector::allocGslData()
{
    gsl_vector *vect=new gsl_vector;
    gsl_block *bl=new gsl_block;
    vect->block=bl;
    vect->owner=1;
    vect->stride=1;
    gslData=vect;
}

by making gsl_vector vect and gsl_block bl protected member variables of the class, and changing the method to:

void Vector::allocGslData()
{
    vect.block=&bl;
    vect.owner=1;
    vect.stride=1;
    gslData=&vect;
}

but I wonder whether I am missing a good reason for allocating vect and bl dynamically (maybe @lornat75 can help me on this).

@barbalberto
Copy link
Collaborator

About this change I like the idea, and I'll stress the fact that the
size of the array will be very clear for the user, to avoid misuse.

I also appreciate the change of the default size of the vector to zero.
Before the constructor of vectorOf was allocating by default 16 elements
of the type, this fact was hiding a serious bug inside the
controlBoardWrapper2 who took me a lot of time to discover; while if the
default size was zero, that bug would have caused a crash and therefore
spotted long time ago and fixed.

I say this because default values can hide bugs as in the CBW2 case; so
changing it, or changing the allocation procedure, can unveil new
bugs... be ready for it :)

Alberto

On 10/04/2013 05:09 PM, Andrea Del Prete wrote:

Thank You Paul for reading what I wrote!

Another possible improvement to /Vector/ is to avoid dynamic memory
allocation here:

|void Vector::allocGslData()
{
gsl_vector *vect=new gsl_vector;
gsl_block *bl=new gsl_block;
vect->block=bl;
vect->owner=1;
vect->stride=1;
gslData=vect;
}
|

by making /gsl_vector vect/ and /gsl_block bl/ protected member
variables of the class, and changing the method to:

|void Vector::allocGslData()
{
vect.block=&bl;
vect.owner=1;
vect.stride=1;
gslData=&vect;
}
|

but I wonder whether I am missing a good reason for allocating /vect/
and /bl/ dynamically.


Reply to this email directly or view it on GitHub
#76 (comment).

@andreadelprete
Copy link
Author

I pushed this change (for the class Vector only) on a fork of YARP and I'll ask for a pull request before doing the same for Matrix.

Regarding my previous comment on the dynamic allocation of gsl_vector and gsl_block I tried implementing the change and I've seen there are problems generated by including the header file yarp/gsl_compatibility.h in Vector.h (rather than in Vector.cpp), so I decided to leave this as it is, for now.

@andreadelprete
Copy link
Author

@barbalberto I quite agree with you, we should be ready for possible bugs.

subVector and setSubvector

Another thing I'd like to change is the behavior of the subVector and setSubvector methods when there are inconsistent sizes (i.e. index out of range). At the moment subVector returns an empty vector, whereas setSubvector simply returns false. This is quite dangerous because it can hide bugs, so I'd rather a crash in case of size inconsistency, because it is the sign of a bug in the code. This issue regards also other methods, some of which already behaves as I'd like (e.g. all the operators for matrix/vector algebra); in my opinion we should make all methods deal in the same way to this kind of errors.

@andreadelprete
Copy link
Author

I just had a big change of mind. Now I think that the best design is the one I discarded in my second post (see "Alternative Design (discarded)"). I'll try and explain why solution B is better than solution A

Solution A (VectorN inherits from Vector)

Pros:

  • all methods that work with Vector can also work with VectorN

Cons:

  • most of the times, using VectorN with a method designed to work with Vector, you lose the benefits of VectorN (because VectorN is converted to Vector, so you have dynamic memory allocation)
  • to take advantage of VectorN you should write new libraries that use VectorN, but these wont work with Vector (unless of course you don't write two versions of each method, one for Vector and one for VectorN)

Solution B (Vector inherits from VectorN)

Pros:

  • writing future libraries to work with VectorN they will automatically work with Vector as well
  • existing libraries can be changed to work with VectorN (only if current performances are not good enough)

Cons:

  • all Yarp and iCub libraries that currently work with Vector wont be directly compatible with VectorN (but we can fix this introducing an easy way to convert a VectorN into a Vector)

Conclusions

There are rare cases in which solution A is better, that is when you use a method that takes a pointer/reference to a Vector and does not allocate any new Vector. In this case, using it with the new VectorN, you get the performance boost for free. However, I think this case is extremely rare, so it is not significant. In all other cases solution B is better, because it allows a simpler design of future libraries, while leaving the option to boost existing libraries with limited effort.

Of course, I'd really like a feedback on this, because I am sure of nothing! :)

@paulfitz
Copy link
Member

paulfitz commented Oct 7, 2013

Adding one more option: a common base class capturing the commonality, rather than trying to force two things that don't really fit into an inheritance relationship? To be clear though, your existing soln is ok by me.

@andreadelprete
Copy link
Author

I think I see what you mean Paul and it sounds like a good idea. Just to check whether I understood your suggestion: you would add an abstract class VectorBase from which Vector and VectorN can inherit. VectorBase woulde define the interface of a vector and it could also implement all the methods having the same behavior.
The main advantage I see is that, in this way, Vector doesn't need to inherit everything from VectorN (such as the static memory storage). Are there any other benefits I can't see?

@paulfitz
Copy link
Member

paulfitz commented Oct 9, 2013

@andreadelprete the main benefit is absence of surprise, so you don't make promises in the documentation/interface for a class (e.g. ability to resize, or a guarantee of fixed size) that are broken by a subclass the user might not know about.

@andreadelprete
Copy link
Author

After a long discussion with @lornat75 I decided to give up (at least for now), but I report here what Lorenzo and I identified as the main challenges.

VectorBase father, Vector and VectorN sons

This inheritance relationship wouldn't allow to fully benefit from the new fixed-size vectors. Take for instance a function that takes a VectorBase as input and that, inside, needs to create a temporary vector having the same size of the input vector. Since VectorBase doesn't have compile-time known size, the temporary vector has to be created with dynamic memory allocation.

VectorN father, Vector son

This inheritance relationship (that's basically the one used by Eigen) is better because it allows to write template functions working with the template VectorN. In this way, internal temporary vectors can be created through static memory allocation, because they have compile-time known size.

Compatibility with GSL

This is the main reason why I decided to give up.
Many Yarp math operations rely on GSL, but no GSL header file is included in Yarp header files (design choice).
Keeping this design is extremely problematic when working with templates, because you cannot implement templates in cpp file. This means you'd need to include GSL headers in Yarp headers, which is against Yarp policy.

Most probably there are ways to work around this issue, but they would take a lot of time to implement.
A possible solution Lorenzo came up with is to add another class VectorGsl, which is father of VectorN.
Given that VectorGSL is not a template it could implement the GSL related part of Vector inside a cpp file.
Then, all the math operators using GSL functions could be based on VectorGSL, so that they wouldn't be template functions and they too could be implemented inside cpp files. All other functions that do not rely on GSL could be based on VectorN, so that they would exploit the static memory allocation of temporary vectors.

Moreover, at the moment Vector creates the GSL structure with dynamic memory allocation (see method allocGslData). This would need to be changed if we want to have a totally static memory vector, but it is difficult to do it without including GSL headers inside Yarp headers.

Another possible problem Lorenzo foresees is due to the export of templates with Windows DLLs (I don't know the details, but there have been problems in the past).

I hope some valorous guy one day will implement this feature, but I'm afraid it's not something you can do in a couple weeks, as I was hoping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue Type: Feat/Enh Req This issue requests some new feature or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants