| free domain hosting | professional website hosting | free dot com domain | hosting reseller | free forum hosting | joomla 1.5 themes | free hosting |
Strings weren't really a big issue in the design of the language C. It's like somebody forgot that there were humans that were going to be using the programs, and humans like to look at text, not bits. C has a data type called "char" which is really the same as an int, except it's limited to one byte and several of the standard-library functions use it to display ASCII characters.
The rest is up to you - C programmers have been using arrays of chars since the dawn of time to write strings. Unfortunately, arrays in C aren't so nice either.
Now C++ comes along, and adds the type "string". Well, not exactly. Other types, like int, float and char are an integral part of the language (even the built-in type bool was added by C++). But the string type is still not. It's written, in the C++ language itself, as an extension. (It can be found in the include file <string>). And a lot of library functions still use char arrays. So, you're still going to have to use them some of the time.
This short article explores how the memory surrounding char arrays works. The main focus is on how to avoid memory leaks with various different types of char arrays.
Note: We will be using the language C++ because we don't want to get bogged down in the specifics of malloc. You could apply everything here to C as well. All of the code is portable to C except you'll need to change new to malloc, bool to int or char, and // comments to /* */.
The big problem with using arrays is, unlike regular types, and even pointers, which can be passed in and out of functions 'til the cows come home, arrays have got this annoying "allocated memory" hanging around them which makes it get quite boggy for the whole principle of abstraction.
Writing functions which accept char arrays as arguments is, on the whole, fine. You use the array for whatever you want it for, then you forget about it. Whichever function passed it in is responsible for managing its memory and deleting it when necessary.
The real problem comes with functions that need to return strings (or arrays). Especially when you aren't using classes, the function is never going to see that pointer again. Since when you allocate memory (using malloc or new), you are supposed to free memory (using free or delete) (remember, if you use malloc, always use free, if you use new, always use delete, and if you use new[], always use delete[]), it causes a problem if you are going to pass out a pointer to allocated memory and never see it again.
Therefore, you must rely on the calling procedure to delete strings and arrays which you passed out. You must also apply this notion to strings and arrays you may have gotten from library functions.
We are going to look at a program written in three different ways, with three different examples of a function passing a string back to the main. Each example has a different problem. We'll see when you should use each one.
We will be looking for:
As you may guess, each of the three examples breaks one of the above dot points. Try to guess which ones ;)
Note: To check for memory leak in a program (in Windows XP), press Ctrl+Alt+Delete, go to the Processes tab, find your program in the list, and look at the Mem Usage column. Note that these tests loop forever, so you will need to terminate them yourself when you finish observing.
Here is the first example:
#include <stdio.h>
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
c = getString(); // Retrieve the string
printf("%s", c); // Print it to the console
}
}
char* getString()
{
/* "Hello World!" would be nice, but too
much effort in the later examples */
return "Hi!";
}
For each example, we will use the same main function, and only change the contents of getString.
In this example, we are simply returning a C string literal containing the message "Hi!".
Output:
Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!{infinite}
Memory Leak: No
The above example does not have a leak, and it works perfectly, but it has one problem, and that is, it isn't particularly useful for most algorithms. True, some functions return a pre-set string literal (possibly from a range of choices), but it doesn't give you much freedom. So, use it when you can. Otherwise, read on...
#include <stdio.h>
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
c = getString(); // Retrieve the string
printf("%s", c); // Print it to the console
}
}
char* getString()
{
char c[4];
c[0] = 'H';
c[1] = 'i';
c[2] = '!';
c[3] = 0;
return c;
}
In this example, we are declaring a local array of chars, with 4 elements, and assigning the chars 'H', 'i' and '!' to the array. We've even assigned the all-important 0-byte null-terminating character, to properly finish the string.
Output:
?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦?_¦{infinite}
(Could be any three random characters, depending on a number of factors.)
Memory Leak: No
Great! Still no memory leak. Little problem, though. The message is being totally garbled. We'll take a look at the causes later. Here's the third and final example:
#include <stdio.h>
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
c = getString(); // Retrieve the string
printf("%s", c); // Print it to the console
}
}
char* getString()
{
char* c = new char[4];
c[0] = 'H';
c[1] = 'i';
c[2] = '!';
c[3] = 0;
return c;
}
I've changed just a single line in getString. Instead of declaring a local array char c[4], I've declared a more generic char pointer char* c, then in a separate action, called new to allocate 4 bytes of memory, then assigned the pointer to those bytes to c.
Output:
Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!{infinite}
Memory Leak: Yes
Okay... so now we've got it working again, but I've got a memory leak!
Now lets look at what's going on in each example.
The last two examples are easiest to explain, so I'll do those first.
The answer lies in the question... "What is the difference between examples two and three?"
The third is actually the simplest of all. The first line can be expanded to two separate lines:
char* c; c = new char[4];
c can take any pointer to any char. When we write new char[4], we are allocating a section in the program's data storage area of memory called the "Heap" for use as we see fit. The new[] operator takes two pieces of information - the "char" and the "4". The "char" tells it two things:
Therefore, the operation new char[4] simply allocates a piece of memory, 4-bytes in length, and returns a char* pointing to it's first byte.
This then happens to be stored in c, but it doesn't have to be. The most important part is that 4 bytes of memory have been allocated, and will continue to be allocated until explicitly deleted.
Ideally, any function which news memory should delete that memory also. But, when a function needs to pass that pointer back, it loses its ability to delete it later. Therefore, all we can do is hope that the calling function deletes it. Therefore, any calling function that retrieves an array should take it upon itself to delete that array when done with it. (In general).
#include <stdio.h>
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
c = getString(); // Retrieve the string
printf("%s", c); // Print it to the console
// Important: Use delete[] on arrays, not delete
delete[] c;
}
}
char* getString()
{
char* c = new char[4];
c[0] = 'H';
c[1] = 'i';
c[2] = '!';
c[3] = 0;
return c;
}
All I've done to example three is add the line delete[] c after the printf statement.
Output:
Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!{infinite}
Memory Leak: No
It's cured! Not only do we see the intended message, and have no memory leak, but we also have a useful function that we can apply any algorithm to. The only proviso is that the onus of deletion is upon the caller.
Now let's see why example two didn't work.
In example two, instead of newing an area of memory and assigning a pointer, we created a local array. Where example three placed the actual data in the Heap memory, local arrays are stored on the stack. Local arrays are a lot like pointers, except they are declared with the following syntax:
char c[4];
This gives us the pointer c, which is actually of type char*, and points to a 4-byte block of allocated memory on the stack. You have to understand that, unlike the newd array, this is of fixed size and actually is not really different to a normal local varable.
So far so good. The problem is that when we get back to main, it's garbage. You don't ever explicitly use the delete operation on a local array, do you? Yet they never create memory leak. Local arrays are automatically delete[]d at the end of their function (when their pointer falls out of scope).
The problem is, local arrays are local. Just as local variables are deleted and freed up automatically when the function ends, so too are local arrays. You'd get the exact same problem with a function like this:
int* retPtr()
{
int i = 42;
return &i;
}
What happens at the end of the retPtr function is, the variable i and its contents are freed up and reused. (They are popped from the stack.) Therefore the pointer to i (&i) points to garbage, and is useless. The same thing happens in example two - when the function ends, the array is automatically freed up and deleted, and the pointer to the array (being returned foolishly from the function) is useless - it simply points to an area on the stack.
Therefore, garbage is printed to the screen, because the contents of the array are no longer "Hi!" but some other random characters.
One positive benefit of local arrays is that, like local variables, they never create memory leak (since they are always automatically freed). Therefore, you should use them within functions, but should never return their pointers, or store their pointers in global or class-level pointer variables.
Now, let us take a look at example one in detail.
Example one used a string literal. I'll pop it up again for convenience:
#include <stdio.h>
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
c = getString(); // Retrieve the string
printf("%s", c); // Print it to the console
}
}
char* getString()
{
return "Hi!";
}
If you understand the theory above, you may be thinking, "huh?" We are returning a char* from a function, so you'd expect it to either free itself up at the end of the function (and print garbage), or be permanently allocated (and create memory leak). Yet it does neither - this example runs perfectly as it is.
The reason is that string literals aren't actually of type char*. They are of type const char*. The difference is that, being consts, they are aware that their value never changes. When your value never changes, there's no need for you to take up new memory each time.
String literals are placed in a special area of memory called "rodata" (read-only data - as distinct from "stack" and "heap" used in examples two and three respectively). This area is special because if you try to write to it, or delete pointers to it, you receive a fatal error. (If you don't, you may have a dodgy compiler, and this would cause a lot of problems if used incorrectly). When you assign a string literal to a pointer, it simply points towards that special location in memory. It can be used to read from, but not write to or delete.
Therefore, as you can deduce, you do not get memory leak, because new memory is not being allocated each time. A single space in rodata, 4 bytes for "Hi!\0" is actually stored in the executable file, and this same space in memory is being pointed to each time.
You can extend the example to prove this.
#include <stdio.h>
bool b = false;
char* getString();
int main()
{
char* c;
while (true)
{ // Test for memory leak with an infinite loop
b = !b; // Flip the state
c = getString(); // Retrieve the string
printf("%d\n", c); // Print the pointer address to the console
}
}
char* getString()
{
if (b)
return "Hi!";
else
return "Ho!";
}
Output:
4341796
4341792
4341796
4341792
4341796
{continues alternating ad infinitum}
As you can see, each time the string literal "Hi!" is returned, the pointer is 4341792, and each time the literal "Ho!" is returned, the pointer is 4341796. As you can see, they are separated by 4 bytes. This is expected, but not guaranteed. What is guaranteed is that they will be the same pointers every time.
Note that if you changed the printf to "%d\n" on the third example, with memory leak, you'd find that the pointers would be different every time:
3281072
3281136
3281200
3281264
3281328
3281392
3281456
3281520
{continues to increase ad infinitum until running out of memory}
As you can see, the memory space is never being reused. If you print the pointers for example two, or the fixed example three (with the delete[] c line), you would find it reusing the same pointer every time, obviously, indicating no memory leak.
If you can't use string literals, you'd have to go with the solution from example three, which was to get the calling function to manually delete the string the function passed back to it. This is a bit cumbersome, because it has to be done any time the function is called, and is bad for abstraction. It is also very annoying to do simple things like print function output. Typically, when a function returns a string, something simple is needed such as a printf. Therefore, a caller should be able to do this:
printf("%s", getString());
Unfortunately, if getString is passing back an allocated array and needs to be deleted, this quick call-and-use method won't be viable, because the pointer is lost as soon as printf is finished. Therefore the caller needs to write three lines just do print the output and delete it. Clearly a more abstract solution is required.
One solution is, if your function is to be called repeatedly, store the pointer in a global or (in C++) class-level variable, and delete[] it the next time the function runs. That way, it will only take up a maximum of one string at a time.
#include <stdio.h>
char* c;
char* getString();
int main()
{
while (true)
// Retrieve string and print it to the console in one line
printf("%s", getString());
}
char* getString()
{
// c is declared as global-level
// Delete the old c before assigning a new one
delete[] c;
c = new char[4];
c[0] = 'H';
c[1] = 'i';
c[2] = '!';
c[3] = 0;
return c;
}
Output:
Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!{infinite}
Memory Leak: No
Main function had to call delete[]: No
As you can see, by moving the delete statement into getString, we saved five lines from the main function! (Since we got rid of the declaration of c, the assignation to c, the delete[] call, and the braces in the loop). So this solution is more abstract, and much easier to call each time.
(Note: This is known as a "callee" onus rather than a "caller" onus - that is, the onus is on the callee class or functions to delete its allocations rather than the caller.)
Note that in C++, in an Object Oriented environment, you should use a private class-level variable instead, encapsulating the pointer as well.
We have looked at three different ways to export C-style strings from functions: using literals (yes, if you can do that), using a local array (definitely not - will be freed up!), or by manually allocating memory and returning the pointer (thats what you do if you can't use literals - and the calling function must delete the string after use).
Each of these methods stored the data in a different spot in the memory address space. The string literals were stored in rodata, the local array was stored on the stack, and the allocated memory was stored on the heap. We saw the differences in these pieces of memory.
We've looked at ways to stop memory leak, and we've looked at how C/C++ handles string literals.
Finally, we looked at an example of a solution to avoid having the caller go to unnecessary lengths to delete the pointers that have been returned to it by using callee-delete ideas. Therefore we have increased our abstraction and encapsulation abilities and still written a function that can return any string (using an algorithm, not literals), without memory leak.