Chunky by FelipeFS
Chunky by FelipeFS
GPWiki.org
It is currently Wed Nov 26, 2014 5:06 am

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: segfault-safe coding?
PostPosted: Mon Jan 06, 2014 6:33 pm 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
It's been a loooong time since I last wrote here and in the meantime, I managed to get to grips with C++ more than anytime before.. sweet C++, though never stopping to occasionally amaze with its devious traps when it comes to memory management and hidden problems, lurking somewhere, undetected in the memory. :D

My prevalent C++ habits in coding are using whatever types necessary, structs and classes, though not using explicit, asterisk * pointers, rather passing everything by reference in the functions. Vector containers are the next most useful thing to be, dynamically updating its size depending on what number of items does it currently carry.

Still, lurking traps can occasionally backstab the running program into crash - code exception 0x0000005, called access violation, or sometimes detected in debug mode as SIGSEGV segmentation fault.
I've done a lot of googling on this matter, what all could cause segfaults. The main reasons are tampering with pointers in such manner that one obsolete or deleted pointer is still referenced and such..
The next cause should be index of out range error within an array or a vector.

As long as I never use explicit pointers by allocating them with "new" or deleting them, which parts of code are to be suspected? Those that deal with vectors, or the classes are more prone to be the cause even if they don't use any pass by pointer but everywhere pass by reference (when needed)?
Are there some guidelines how to proceed in a code that deals largely with structs, classes and vectors and regular variables to ensure what the compiler cannot see - avoiding run-time errors such as segmentation faults?
Now, I know there are warnings on the issue how to use pointers and that after every new pointer it should be deleted and set to NULL or some such policies, but is there something equivalent to this concerning data structures and vectors?

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
PostPosted: Mon Jan 06, 2014 10:53 pm 
Cubic Contributor

Joined: Sun May 27, 2012 6:01 pm
Posts: 67
Hi!

I've not posted here for a while either. But hope I can help.

A segmentation fault is access to member with which you do not have access to.

The typical problems are:
* Dereferencing a pointer which is NULL can cause a segfault. So maybe wise to check to see if the pointer is valid. - Try asserts or conditional checks on pointers.
* Incorrect pointer arithmetic (Avoid doing this anyway)

Its also worth knowing;
"However, the C++ vector class operator[] does not check the subscript and doesn't throw an exception if it is out of bounds." - If you are using [] to access elements in the vector, this can also cause segfaults. Use .at() instead.

As you rightly say, these problems are dynamic/runtime errors. And im sorry but its something you have to deal with with manual memory managed languages. There are some tools that can help like valgrind.

Mikey


Top
 Profile  
 
PostPosted: Mon Jan 06, 2014 11:05 pm 
Grand Optimizer

Joined: Sun Oct 16, 2011 3:09 pm
Posts: 367
Location: Here (where else?)
References and pointer are pretty much the same thing, except references cannot point to NULL (afaik).
It's mostly just as easy to make a dangling reference as it is to make a dangling pointer. Only using references is not going to save you I am afraid :p

The guidelines I use are basically 'meaning and initialization', and 'ownership and lifetime'. Also, documenting this is important.

Initialization means that at all times it must be clear whether the contents of a variable can be trusted not to contain garbage. The simple case is when always all variables can be trusted. Rules like clearing pointers after deleting come from it.
Other options are that some variables only contain meaningful information when some special variable has a certain value (the latter variable must of course also be trustworthy of course).

Meaning is about what a value means. Usually it is quite easy, but often numeric values or enumeration values can be tricky, in particular when combined with boolean logic operators &, |, <<, and >>.

Ownership means that every piece of memory has a known owner at all times. Owners 'take care' of their owned objects, and release them at the appropriate time. The simple and common case is that ownership stays at one objects, but you can have situations where ownership gets transferred, eg malloc does that, telling you to free the memory it gives you.

Ownership prevents problems like double free, but it is also a stepping stone to lifetime. Lifetime means you know when an object will cease to exist.
You should never give away pointers or references to an object which lives shorter than the pointer or reference is used (which often boils down to the object which has the reference or pointer must not live longer than the pointed-to object).

Last but not least, imho, is documentation. The above may look trivial, and in fact it mostly is, there are no weird ideas in there. The problem however is that you have a lot of objects floating around in the code, so many that you cannot remember that all in your head for more than say a week.
So unless you write perfect code in a week and then never ever touch it again, you'll have to write down whether data members carry ownership or not, and 3 in that variable has such or so meaning.
You have to write down that the caller is responsible for freeing a returned object, and that pointers or references are valid until something happens.

Writing it down has several benefits.
First of all you write it down, which forces you to think whether it is a good idea. It's on the screen, staring into your face.
Secondly, it's a source of information which should match with the program code, ie it acts as a second opinion. If the documentation and the code disagree, something fishy is going on. If you don't have documentation, you'll never know until it segfaults.
Thirdly, another programmer (you after a week) gets the same view on your code, which is often more reliable than roughly deciphering what some arbitrary code is doing, and missing some important details (ownership or lifetime is not something you explicitly code).

_________________
My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net


Top
 Profile  
 
PostPosted: Tue Jan 07, 2014 12:46 pm 
Bytewise
User avatar

Joined: Sun Aug 05, 2012 9:32 pm
Posts: 272
Thanks Alberth, that was actually pretty educational :) I almost never write good commentary on my code and as you pointed out, I then end up looking at my own older code and I get pretty confused and have to rediscover a lot of things :D

_________________
Did you ever wonder, how time works?


Top
 Profile  
 
PostPosted: Tue Jan 07, 2014 7:24 pm 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
Thanx for everything.
It seems the memory management really plays all the roles here, be it a pointer, a reference or a vector of objects.
Quite a challenge it seems - an efficient programmer should really be aware all the times what s/he's doing, each line from up to down when it comes to creating objects of a class, and then.. (what with them? delete them? nope, delete can only be applied to pointers initialized by "new"). Hmm.. this really needs to take some time to swallow all the concept and get it into mind, permanently, and thus, be immune to the most painful - the unpredictable run-time errors.

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
PostPosted: Tue Jan 07, 2014 8:25 pm 
Grand Optimizer

Joined: Sun Oct 16, 2011 3:09 pm
Posts: 367
Location: Here (where else?)
Hazarth wrote:
Thanks Alberth, that was actually pretty educational :) I almost never write good commentary on my code and as you pointed out, I then end up looking at my own older code and I get pretty confused and have to rediscover a lot of things :D
Until several years ago I also didn't write anything down. Then a team-mate suggested we'd start documenting each function. I thought, "sure, why not", not expecting much from it. However, I found it increased my productivity.
Just those few lines of text is sufficient to program a function-call without having to read its code. Just supply what the text says, and it will work even if the code is 2 years old. I think it works, because in text you can explain in a few words what the intention is in a few words, instead of having to decipher intention from a long sequence of calculation steps.

_________________
My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net


Top
 Profile  
 
PostPosted: Tue Jan 07, 2014 9:52 pm 
Grand Optimizer

Joined: Sun Oct 16, 2011 3:09 pm
Posts: 367
Location: Here (where else?)
ElectroPaladin wrote:
It seems the memory management really plays all the roles here, be it a pointer, a reference or a vector of objects.
It's a big source of errors, but not the only one. Index errors and off-by-one errors are also a good source :p

My defense against index errors is to use one form of arrays and stick with them. I use 0-based arrays since my youth, and can program them faultlessly in my sleep now :p
I find off-by-one errors are best detected by programming the code, then use a simple case (usually the first and/or the last entry), and check the index expression is correct by computing the index value in your mind. It also keeps your brain trained in doing calculations with numbers, which is always a useful skill :)

Finally, pointer manipulations are always fun. Coding a routine to insert or remove a node in a double linked list or so works best for me by drawing the start situation at a piece of paper, then coding a statement, update the situation at the paper, code another statement, etc. I often also try a few other edge cases on paper that way, to make sure it always works.

ElectroPaladin wrote:
Quite a challenge it seems - an efficient programmer should really be aware all the times what s/he's doing, each line from up to down when it comes to creating objects of a class, and then.. (what with them? delete them? nope, delete can only be applied to pointers initialized by "new"). Hmm.. this really needs to take some time to swallow all the concept and get it into mind, permanently, and thus, be immune to the most painful - the unpredictable run-time errors.
There are other tricks to reduce the amount of work. One thing I do is being very paranoid, I don't trust myself too much with the zillion of details I need to take care of. My code is littered with assert checks, so the computer checks at all the time whether my ideas hold.

As soon as I find that some condition should always hold at some point in the code, I throw in an assert to check it. This often happens on input values for a function, or output results, but also eg when computing a size, allocating memory, and then filling the memory. At the end, the size of the filled memory should match with the computed size.
Similarly, when you expect a pointer to be assigned only once, check that it's NULL before the assignment (assuming you can trust its value). If you are afraid of memory leaks, check that a pointer is indeed NULL just before going out of scope, for example in a destructor.
Adding "default: assert(false);" in a switch when you don' expect unhandled values.

It may seem weird to add these checks, and cause the program to crash more often. However, it's intentional. The idea behind it is that the program is checking things that never happen. Thus if it does happen, something is very seriously wrong. Continuing is basically pointless, since basic assumptions about the program are broken.
If you do this right, the number of crashes due to failed checks is much larger than the number of crashes due to other causes.
While it's bad that it crashes, the good news is that you caught it before a real crash, ie you're closer to the real cause. In my experience, it's usually also pretty good reproducable, which is good for bug hunting.
Also, you found an "impossible" situation, which is often a good clue as to what can be wrong. (And in quite a few times, the computer was right, and my ideas were wrong :) )

A final trick I do is to have the system generate a core dump on a crash, and always run a binary with debugging data attached to it. It makes that I can load a core dump into the debugger, and I can look at the stack, examine values etc. Very useful to get an idea of what the program was attempting to do, just by looking at names of the nested function calls.

_________________
My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net


Top
 Profile  
 
PostPosted: Tue Jan 07, 2014 10:55 pm 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
Whoaa, a much beneficial response! :) Now I understand... checking boundaries of everything, to ensure nothing is out of scope or out of index range within arrays.

Just remembered that I caught a glimpse of something as "uninitialized values" problem. Not knowing whether it is directly responsible for the havoc in memory if it's never being used for now, and rather written in the code for future uses.

When I try to start and code larger projects than smaller apps, I actually do a lot of this:
for example:
Code:
class Character{
public:
   
   string CharacterIDname;
   string CharacterName;
   int map_Position[2];
   // ...and innumerable stats may follow afterwards
   int stat_hitpoints;
   int stat_hitpointsMAX;
   int stat_strength;
   int stat_perception;
   int stat_dexterity;
   int stat_intelligence;
   int stat_endurance;
 
   unsigned int race_type; // e.g. 0 - unused, 1 - human, 2 - elf, 3 - dwarf, 4 - troll...
   mySprite &spriteUsed;

 ... ... etc. etc.

   // now, time for constructor (or without it; but then, everything'll need to be initialized manually after the creation of each class object)

   Character(string nameID, string name, int pos[2]);

};

// constructor definition
Character::Character(string nameID, string name, int pos[2]){
   CharacterIDname = nameID;
   CharacterName = name;
   map_Position = {pos[0],pos[1]};
}



As is it obviously seen, I do not initialize any other variables yet, (the stats are currently not initialized) only those I'm currently working with, cos these I'm working with right now, are gonna be used right after the declarations of the class.. each object will have its own unique object member values - that's what OOP is for.
But the question is whether this is a problem that should not be appearing in the code, you know, I mean, keeping uninitialized class/object members in the code (not even hidden by comment marks // ) until the time the attention comes to them and scripting the stats will be required.

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
PostPosted: Wed Jan 08, 2014 6:34 pm 
Grand Optimizer

Joined: Sun Oct 16, 2011 3:09 pm
Posts: 367
Location: Here (where else?)
Variables that you never use only take some space in the objects, and nothing much else. They are harmless otherwise :)
The only risk that you run is that you forget which variables you use and which ones you don't use.

_________________
My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net


Top
 Profile  
 
PostPosted: Mon Jan 13, 2014 2:17 pm 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
Now, I think have discovered the swine of a problem. Just hoping this'll get me somewhere.. :\

Now, I don't know if I must start hating vectors or what... the classes or structs in combination with vectors with their push_back commands seem to be the evil that was hidden until, well, until I HAD to discover.

Such obvious glaring memory leak as is seeing odd number with several digits that "should be" rather index 0, just cannot be overlooked. I caught it red-handed with cout << ... << endl; command.

Now I am really confused. What does the darn push_back function really does? I use it as the only means to add items into vector of items.
I like vectors because of the dynamic size allocation - anytime when I need, I can add an item, or erase item from it, and the size is updated consequently after that. But the push_back routine is starting to dissapoint me.

Am I really doing something wrong when I have multiple classes I want to make objects from and also make vectors of those classes to be containers of class objects according to their types?

Code:
class Character{
   public:
   string NameIDX;
   string name;
   int type; // numeric assignment of types from 0 to whatever... e.g. 0-human, 1-elf, 2-goblin.. etc.
   ...etc. etc. ...

  // add a constructor (or without - in case of without, assign values to members manually after creating of each class object)
   Character(string IDname, string nameGiven, int typeGiven);

   // and some destructor, arbitrarily...
   ~Character();
};

Character::Character(string IDname, string nameGiven, int typeGiven){
   NameIDX = IDname;
   name = nameGiven;
   type = typeGiven;
}

Character::~Character(){
}

// ok, we have our class set up...
// now, we can make a lot of similar classes, with or without their constructors/destructors

  // we also have some class that mix all class types into universal map objects
Class MapObject{

   string NameID;
   int MapPos[2]; // tile position on map (2-Dim)
   string objectType; // "Character" for character type, "Scenery" for scenery type, "Container" for container type etc.
   
   int layer; // layer order of the object to be drawn on map (will be updated permanently in the loop)

};



// some global vars... (vectors of classes needed!)
   
   // vector of Characters
   vector<Character> AllCharacters;
   // vector of Sceneries
   vector<Scenery> AllScenery;


  // master-vector of all objects
   vector<MapObject> AllMapObjects;




// time for initialization

void Init(){

// at the start, setup some characters

   Character PlayerChar("playerCharacter", "giveSomeName", 0);

   Character NPC_human("NPCHuman_test01", "Ulfmar", 0);

   // assign some script attached to it
   NPC_human.ScriptAttached = "scr_HumanTest01.txt";
   NPC_human.spriteAssigned = SomeHumanSprite; // the object of some sprite class

   Character NPC_elf("NPCElf_test01", "Elwynn", 1);
   NPC_elf.ScriptAttached = "scr_ElfTest01.txt";
   NPC_elf.spriteAssigned = SomeElfSprite; // the object of some sprite class

  // enough for now; add it into Characters container

  AllCharacters.push_back(PlayerChar);
  AllCharacters.push_back(NPC_human);
  AllCharacters.push_back(NPC_elf);

  for(unsigned int i=0; i<AllCharacters.size();i++){
      cout << "Character: " << AllCharacters[i].NameIDX << ", with name: " << AllCharacters[i].name<< " allocated successfully." << endl;}

// ok, that's it for now; from the console output it seems everything is ok. Worse part comes when I try to find them by their index names;
// sometimes it works, but there's still at least one entry, having odd array index (as though pointing to some memory leak) - some odd mention is that the console doesn't even want to print the message from the function

cout << FindCharacterIDX(PlayerChar) << ", " << FindCharacterIDX(NPC_human)  << ", " << FindCharacterIDX(NPC_elf) << endl;

// when this is printed out, sometimes one of these three characters to find is omitted and instead of finding the index (which can possibly be only either 0,1, or 2 - cos, they're in order), the console prints some 1243001, blah blah number... obviously a wrong memory place.

}


// here we have our function to find characters by Name index
unsigned int FindCharacterIDX(string FindNameIDX){
   // look up for every entry in the characters' container
   for(unsigned int i=0; i<AllCharacters.size();i++){
        if(AllCharacters[i].NameIDX == FindNameIDX) {
                cout << AllCharacters[i].NameIDX << " found at entry: " << i << endl;
                return i;
         }
   }

}



This happened to me not once, and when it happened, one whole object was completely not visible.
In debug mode, oddly enough, it pretends nothing is wrong, but in release mode, crash sigsegv on run...

Do I something wrong with approaching every addition to Characters container by push_back everything?

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
PostPosted: Mon Jan 13, 2014 7:45 pm 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
ehmm.. sorry for this last post... maybe the fault is somewhere else.. :rolleyes

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
PostPosted: Tue Jan 14, 2014 9:43 pm 
Grand Optimizer

Joined: Sun Oct 16, 2011 3:09 pm
Posts: 367
Location: Here (where else?)
Hi,

Do you know about copy constructors?

Code:
vector<Character> AllCharacters;
Character PlayerChar("playerCharacter", "giveSomeName", 0);
AllCharacters.push_back(PlayerChar);
The push_back makes a copy of the data in the PlayerChar, into the first element of AllCharacters.
In other words, after the push_back, there are 2 objects containing the same IDname, namely PlayerChar.NameIDX and AllCharacters[0].NameIDX.

It uses Character::Character(const Character &ExistingChar) if you have that defined. Otherwise, it creates its own copy constructor, by copying each element separately. In the code above you are safe, since "int" and "string" can be copied like that. Pointers and references in objects typically need special consideration here.

What I don't understand is "FindCharacterIDX(PlayerChar)". You are giving the PlayerChar object to the function. Unless you use a reference in the formal parameter, you are thus making a copy of the object for the function call. Should still be safe, but it may be not what you think it does.

As for the weird number, I suspect it happens when none of the elements matches, ie "AllCharacters[i].NameIDX == FindNameIDX" fails for every i
Then it drops out of the for-loop, and returns from the call without a proper "return X" value, so you're getting a random number.

Add an "assert(false);" below the for-loop, if you think the computer should never get there :)

_________________
My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net


Top
 Profile  
 
PostPosted: Wed Jan 15, 2014 11:19 am 
Shake'n'Baker
User avatar

Joined: Fri Feb 05, 2010 1:11 pm
Posts: 61
Location: MiddleEurope
Thanks for reply, Alberth.

I spotted a mention of copy constructors on the net, while googling on this matter of problem and have only a brief, vague knowledge of them. There was such a mention somewhere that you have to specify the copy constructors yourself, or otherwise it'll fill the code that belongs them automatically (that was sometimes described as wrong for the problem the people had). Hopefully, I won't have to care about the definition of copy constructors, unless it becomes a problem in a long run...

Alberth wrote:
What I don't understand is "FindCharacterIDX(PlayerChar)". You are giving the PlayerChar object to the function. Unless you use a reference in the formal parameter, you are thus making a copy of the object for the function call. Should still be safe, but it may be not what you think it does.


sorry, just reviewed my posted code and found out I make a mistake in it...
It indeed needs to be FindCharacterIDX("playerCharacter"), FindCharacterIDX("NPCHuman_test01") and FindCharacterIDX("NPCElf_test01")... as long as the function takes string as argument (and NameIDX member is a string) - even compiler would warn me about that, should I write something like that before :D

Thank you Albert, just fixed the problem. Sometimes, it's just a matter of a code management - I've done some changes in the code by splitting the class Character into Character prototype and it's object instance on the map:

class CharacterProto {...} and
struct CharacterObject {...}

prototype, carrying all static data and object, carrying derived and dynamic data that is meant to frequently change in a program loop.


As for the finding the character objects by their NameIdx's, I found out I had two character object instances of one character proto. As long as the objects take their NameIdx's from their protos, it just happened that two objects shared the same index name, rendering only one of the objects visible when processed.
How did I solve it? Just by ensuring unique names for each object (!!! note that it was originally intended that each object MUST have its own index name - unless such object is unique and there can be only one demonstration of the object)

For example let's say we have three goblins of same Goblin prototype,
and so:
Code:

CharacterProto GoblinWeak;
GoblinWeak.InitializeCharProto(...); // initialize it, giving it all the needed assigned members...

// object demonstrations of the proto
unsigned int howManyGoblins = 3;
CharacterObject someGoblins[howManyGoblins];

for(unsigned int i=0; i<howManyGoblins; i++){
    someGoblins[i].NameIDX = GoblinWeak.NameIDX + convertIntToString(i);
}



After doing this, I encountered no issue about the stuff.

some educational lesson from this: If someone's making a game in this fashion of a code, never forget to ensure that each object must have its own unique member that serves as a means to find that object (in my case, finding by string - indexing name)

_________________
Even this world is "programmed" by a Creator, the most skilled programmer of us all. What do you think of all that exists and all the environmental phenomena? He that maketh all had programmed it all and whenever needed, He can call one of the functions with input specified by Him. :)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron

Powered by phpBB® Forum Software © phpBB Group