[Discuss] 'C' string tokenizer for those who hate strtok

Paul Nienaber phox at phox.ca
Thu Jun 29 11:53:55 PDT 2006


p.willis at telus.net wrote:
> Quoting Paul Nienaber <phox at phox.ca>:
>
>   
>> p.willis at telus.net wrote:
>>     
>>> Quoting p.willis at telus.net:
>>>
>>>   
>>>       
>>>> Quoting Paul Nienaber <phox at phox.ca>:
>>>>
>>>>     
>>>>         
>>>>> Buffer from nowhere?  POSIX has mandated strtok_r() for like... ever. 
>>>>> strtok() _is_ stupid.  It's also about one more line to use strchr() or
>>>>> one can use BSD strsep(), or whatever...
>>>>>
>>>>> ~p
>>>>>       
>>>>>           
>>>> Paul,
>>>>
>>>> It's fluff. It's a learning excercise for 'C' linked lists for beginners.
>>>>
>>>> It's entertainment...or would you rather read about partitioning hard
>>>>         
>> drives
>>     
>>>> 200 more times.
>>>>
>>>> Peter
>>>>     
>>>>         
>>> I should also note that this technique is better than all 
>>> of the above mentioned tokenization routines in that it doesn't
>>> destroy the original data. It always makes me wonder about
>>> libraries when the actual 'man pages' say to avoid the routine
>>> if possible.
>>> (ie: strtok, strtok_r, and strsep all come with this warning)
>>>
>>> A second point regarding the storage is that the deallocation
>>> is obviated by this technique reducing memory leaks. But that's
>>> splitting hairs since free() also works [most of the time] for
>>> some of the other routines.
>>>   
>>>       
>> Yeah.  It's way better to allocate another buffer for every token,
>> rather than copying the whole thing and delimiting it, which of course
>> makes your technique "not better", because it incurs a whole pile more
>> calls to malloc() and friends...  </rant>  (but you were the one who
>> decided to use the word "better"...)
>>
>> I should come clean here and mention that I've taught C at UVic on at
>> least one occasion.  I won't even go into the C-specific issues here.
>>
>> ~p
>>     
>
> How is allocating 5 buffers any different than allocating one?
> I'd really like to be informed regarding malloc since most
> of the linux system uses it to allocate memory at anything 
> above the kernel level. What's wrong with malloc and friends?
>   
There's overhead.  Even when it's all done in userspace, the GNU
implementation is nasty, and by allocating more little bits, you can end
up making future calls to malloc() slower...
> I think dynamically allocating memory for storage
> is a pretty good idea. That's what programs *should* do.
> Otherwise we end up with buffer overrun exploits, etc..
>   
It is, but allocating a "chunk" instead of calling malloc() a gazillion
times is far more efficient, especially when you're pretty much being
handed a way of having the buffers neatly packed into the chunk.
> As for C-Specific issues I'm not sure what you mean. 
> Does my 'C' code have punctuation problems? :)
>   
Ok, answers to that then: (not meant to be offensive)

Your use of void* is ugly:  Don't cast the return value of malloc(), and
replace all those void*'s in your structs with pointers to the proper
struct types.

There is no reason to be using: typedef struct foo_ {} foo;
instead of: typedef struct {} foo;
Unless of course you're taking advantage of it to have a pointer to
itself in there somewhere, as mentioned above.

Magic numbers:  What's with the extra 4 bytes you've allowed out the end
of each token?  I didn't look much, but the only thing I saw it being
used for is to store at least one '\0'.  Using memset there is also
pointless... just use string[length_of_string + 1] = '\0';  (or you can
use calloc)


~p


More information about the Discuss mailing list