[Discuss] 'C' string tokenizer for those who hate strtok
Paul Nienaber
phox at phox.ca
Thu Jun 29 11:53:55 PDT 2006
p.willis at telus.net wrote:
> Quoting Paul Nienaber <phox at phox.ca>:
>
>
>> p.willis at telus.net wrote:
>>
>>> Quoting p.willis at telus.net:
>>>
>>>
>>>
>>>> Quoting Paul Nienaber <phox at phox.ca>:
>>>>
>>>>
>>>>
>>>>> Buffer from nowhere? POSIX has mandated strtok_r() for like... ever.
>>>>> strtok() _is_ stupid. It's also about one more line to use strchr() or
>>>>> one can use BSD strsep(), or whatever...
>>>>>
>>>>> ~p
>>>>>
>>>>>
>>>> Paul,
>>>>
>>>> It's fluff. It's a learning excercise for 'C' linked lists for beginners.
>>>>
>>>> It's entertainment...or would you rather read about partitioning hard
>>>>
>> drives
>>
>>>> 200 more times.
>>>>
>>>> Peter
>>>>
>>>>
>>> I should also note that this technique is better than all
>>> of the above mentioned tokenization routines in that it doesn't
>>> destroy the original data. It always makes me wonder about
>>> libraries when the actual 'man pages' say to avoid the routine
>>> if possible.
>>> (ie: strtok, strtok_r, and strsep all come with this warning)
>>>
>>> A second point regarding the storage is that the deallocation
>>> is obviated by this technique reducing memory leaks. But that's
>>> splitting hairs since free() also works [most of the time] for
>>> some of the other routines.
>>>
>>>
>> Yeah. It's way better to allocate another buffer for every token,
>> rather than copying the whole thing and delimiting it, which of course
>> makes your technique "not better", because it incurs a whole pile more
>> calls to malloc() and friends... </rant> (but you were the one who
>> decided to use the word "better"...)
>>
>> I should come clean here and mention that I've taught C at UVic on at
>> least one occasion. I won't even go into the C-specific issues here.
>>
>> ~p
>>
>
> How is allocating 5 buffers any different than allocating one?
> I'd really like to be informed regarding malloc since most
> of the linux system uses it to allocate memory at anything
> above the kernel level. What's wrong with malloc and friends?
>
There's overhead. Even when it's all done in userspace, the GNU
implementation is nasty, and by allocating more little bits, you can end
up making future calls to malloc() slower...
> I think dynamically allocating memory for storage
> is a pretty good idea. That's what programs *should* do.
> Otherwise we end up with buffer overrun exploits, etc..
>
It is, but allocating a "chunk" instead of calling malloc() a gazillion
times is far more efficient, especially when you're pretty much being
handed a way of having the buffers neatly packed into the chunk.
> As for C-Specific issues I'm not sure what you mean.
> Does my 'C' code have punctuation problems? :)
>
Ok, answers to that then: (not meant to be offensive)
Your use of void* is ugly: Don't cast the return value of malloc(), and
replace all those void*'s in your structs with pointers to the proper
struct types.
There is no reason to be using: typedef struct foo_ {} foo;
instead of: typedef struct {} foo;
Unless of course you're taking advantage of it to have a pointer to
itself in there somewhere, as mentioned above.
Magic numbers: What's with the extra 4 bytes you've allowed out the end
of each token? I didn't look much, but the only thing I saw it being
used for is to store at least one '\0'. Using memset there is also
pointless... just use string[length_of_string + 1] = '\0'; (or you can
use calloc)
~p
More information about the Discuss
mailing list