Hi,
I am trying to write a function that cuts a sentence to its words. And print them back words. I have been doing something like this:
int main()
{
   string s="Hello there how are you?";
   string temp="";
   vector<string> tokens;
   for(int i=0;i<s.size();++i)
   {
       if(s[i]!=' ')
       temp+=s[i];
       else
       {
           tokens.push_back(temp);
           temp="";
       }
   }
   for(int i=tokens.size()-1;i>=0;–i)
   cout<<tokens[i]<<endl;
   return 0;
}
Hand coding everything. But the problem is there is always one token missing. I don't get it why? And also please tell me if there is any convenient yet simple way to tokenize library in pure C++?
Tokenizing a String in C++
Just execute your code in mind and see that after the 2nd last push_back(), what happens, temp is set "". You read a char and add to temp you read another char and add to temp. When the string is done reading you just get out of the loop. The temp is holding the last word but you haven't pushed it back! So it doesn't show up. Just add another push_back before the final for loop. And your problem will be solved.
Tokenizing a String in C++
Thanks shabab. That was very foolish of me. Simple mistake. Anyway any more suggestions for simpler tokenizing?
Tokenizing a String in C++
You can do the same thing with less hazard using stringstream. Look at the code:
stringstream ss(s);
   while(ss>>temp)
       tokens.push_back(temp);  Â
Its just same as yours done in 3Â lines.
Tokenizing a String in C++
Wow Apocalypse. That's more than overkill. +1 for that string stream thing. Maybe I shall stick to it. Any further info on this?
Tokenizing a String in C++
Stringstream is a very good option for simplicity. And it is also usable under various conditions other than string tokenization. you can even use to convert string to int!. Here is a datasheet.
If you want to tokenize with something more than just white space use strtok.. It is old but powerful. Here is a strtok version of your program:
   string s="Hello there,how are you?good<yes/no>.";
   string temp="";
   vector<string> tokens;
   char str[100];
   strcpy(str,s.c_str());
   char * pch;
   pch = strtok (str," ?,.-<>/");
   while (pch != NULL)
   {
     tokens.push_back(pch);
     pch = strtok (NULL, " ?,.-<>/");
   }
   for(int i=tokens.size()-1;i>=0;–i)
       cout<<tokens[i]<<endl;
here the function strtok (str," ?,.-<>/");Â does the magic, the first argument is your string to be tokenized. and the second one gets the list of tokenizer characters. The program given here gives you output:
no
yes
good
you
are
how
there
Hello
You can see that the tokenization is done whenever any of the " ?,.-<>/ "appears
Tokenizing a String in C++
Thanks a lot .That's what I was expecting.