[Discuss] 'C' string tokenizer for those who hate strtok

Brian Quinlan brian at sweetapp.com
Fri Jun 30 02:40:53 PDT 2006


David Bronaugh wrote:
> Brian Quinlan wrote:
> Eh? How hard is it to understand a 1-liner as opposed to a 300-line mass 
> of C? Or even a 10-line Python program?

> What's so hard about @foo = split(/\//, $str)?

Nothing. I actually missed part of the semantics of the C program - I 
thought that 0-length tokens were suppressed. The Python equivalent becomes:

foo = str.split('/')

>> > Not the point.
>>
>> What is YOUR point? My point is that most of this arguing about 
>> software engineering at the C level is moot because you are wasting 
>> your time expressing such an algorithm in C in the first space - you 
>> found a significant bug in the code that you rewrote and this is for a 
>> problem that is completely trivial.
> The point is that you're comparing apples and oranges. Perl and Python 
> are both significantly slower at runtime and significantly faster to write.

Whether they are significantly slower at runtime depends on the 
algorithm that you are expressing and how your code is written e.g. in 
this case, my Python code can split a 3,889 character string into 1000 
substrings 10000 times in 3.89 seconds. Your C code takes 13x longer. I 
would expect that all the calls to malloc are killing you - Python 
pre-allocates memory in medium-size chunks and manages it's own pools so 
it probably ran my entire test using a single malloc call where the C 
code required 1000 * 2 * 10000 malloc calls (and a corresponding number 
of free calls).

But I agree that C code can always be made to be faster than Python code 
if you are willing to spend enough time optimizing it. In this case, you 
could pre-allocate len(string) * 2 bytes to store the tokens.

> If you're going to write in C, it had better be clean, well-tested, and 
> high performance.

Right. So, as I said before, C is pointless in this case as the Python 
and Perl versions are both cleaner, better tested and have better 
performance. Actually, I didn't test the Perl version for performance - 
I'm just assuming that it is about the same as Python.

> Otherwise -there is no point-.

Exactly!

> Write Ocaml. Faster than C to write, and runs at around the same speed 
> as C. Oh, and it actually has error checking.

I don't know what you mean by the last sentence.

Cheers,
Brian


More information about the Discuss mailing list