I am trying to read a text file and then store each line into a vector then split that vector and isolate each word to another vector. basically, I want to read a file then get the frequency for common words and their count without using maps.
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
int main()
{
// create a simulated file with 3 lines of text, 20 words total
std::vector<std::string> read_file { "Now is the time for all good men",
"Four Score and Seven Years Ago",
"Split Entire Lines Into Individual Words"};
// blank vector to hold the individual word tokens
std::vector<std::string> words;
// loop through all the 'read' lines to tokenize each one
for (constauto& itr : read_file)
{
// create a stringstream for using stream operators to read each line
std::istringstream line(itr);
// a string to hold each read word
std::string word;
// loop, reading each individual word
while (line >> word)
{
words.push_back(word);
}
}
// verify the correct number of words were tokenized (20, right?)
std::cout << words.size() << '\n';
}
The main issue is what constitutes a 'word'. Is dog the same as Dog the same as dog. the same as dog! the same as "dog etc etc etc. Do you need to remove all punct etc from the word and make into say lowercase for the purposes of counting frequency?
getline() will obtain one line from the line and using std::istringstream and extraction (>>) each white-space delimited token can be extracted (as per George's post above). The issue is then what to be with these tokens...
How are you going to get the frequency count from these vector's of 'words'? Why not use std::map? It's made for something like this.