Tokenizing a String in C++

Asked By 250 points N/A Posted on -
qa-featured

Hi,

I am trying to write a function that cuts a sentence to its words. And print them back words. I have been doing something like this:

int main()
{
    string s="Hello there how are you?";
    string temp="";
    vector<string> tokens;
    for(int i=0;i<s.size();++i)
    {
        if(s[i]!=' ')
        temp+=s[i];
        else
        {
            tokens.push_back(temp);
            temp="";
        }
    }

    for(int i=tokens.size()-1;i>=0;–i)
    cout<<tokens[i]<<endl;
    return 0;
}

Hand coding everything. But the problem is there is always one token missing. I don't get it why? And also please tell me if there is any convenient yet simple way to tokenize library in pure C++?

SHARE
Best Answer by Apocalypse
Answered By 0 points N/A #91554

Tokenizing a String in C++

qa-featured

Just execute your code in mind and see that after the 2nd last push_back(), what happens, temp is set "". You read a char and add to temp you read another char and add to temp. When the string is done reading you just get out of the loop. The temp is holding the last word but you haven't pushed it back! So it doesn't show up. Just add another push_back before the final for loop. And your problem will be solved.

Answered By 250 points N/A #91555

Tokenizing a String in C++

qa-featured

Thanks shabab. That was very foolish of me. Simple mistake. Anyway any more suggestions for simpler tokenizing?

Best Answer
Best Answer
Answered By 0 points N/A #91556

Tokenizing a String in C++

qa-featured

You can do the same thing with less hazard using stringstream. Look at the code:

stringstream ss(s);
    while(ss>>temp)
        tokens.push_back(temp);   

Its just same as yours done in 3  lines.

Answered By 250 points N/A #91557

Tokenizing a String in C++

qa-featured

Wow Apocalypse. That's more than overkill. +1 for that string stream thing. Maybe I shall stick to it. Any further info on this?

Answered By 0 points N/A #91558

Tokenizing a String in C++

qa-featured

Stringstream is a very good option for simplicity. And it is also usable under various conditions other than string tokenization. you can even use to convert string to int!. Here is a datasheet.

If you want to tokenize with something more than just white space use strtok.. It is old but powerful. Here is a strtok version of your program:

    string s="Hello there,how are you?good<yes/no>.";
    string temp="";
    vector<string> tokens;
    char str[100];
    strcpy(str,s.c_str());
    char * pch;
    pch = strtok (str," ?,.-<>/");
    while (pch != NULL)
    {
      tokens.push_back(pch);
      pch = strtok (NULL, " ?,.-<>/");
    }
    for(int i=tokens.size()-1;i>=0;–i)
        cout<<tokens[i]<<endl;

here the function strtok (str," ?,.-<>/");  does the magic, the first argument is your string to be tokenized. and the second one gets the list of tokenizer characters. The program given here gives you output:

no
yes
good
you
are
how
there
Hello

You can see that the tokenization is done whenever any of the " ?,.-<>/ "appears

Answered By 250 points N/A #91559

Tokenizing a String in C++

qa-featured

Thanks a lot .That's what I was expecting.

Related Questions