🗿jjabrahams567

Best Read File Code?

Jun 17, 2020 at 3:42pm

hello community,

I need to read the elements from file.txt:
number, string
what is the most performing way, even if there are thousands of lines?

I currently use too much resource with this code:

bool hashtable::getFileContent(string fileName)
{
	ifstream in(fileName.c_str());
 
	// Check if object is valid
	if(!in)
	{
		cerr << "Cannot open the File : "<< fileName << endl;
		return false;
	}
 
	string str;
	// Read the next line from File untill it reaches the end.
    while (getline(in, str))
	{
		// Line contains string of length > 0 then save it in vector
		if(str.size() > 0)
			this->fileInsert(str);
	}
	//Close The File
	in.close();
	return true;
}

bool hashtable::fileInsert(string str)
{
    int key1=0;
    string key,value;
    stringstream tmpstream(str);

    if(!getline(tmpstream, key, ',')) 
    {
        cerr << "no delimiter key" << endl;
        return false;
    }

    key1 = atoi(key.c_str()); //string to int

    if(!getline(tmpstream, value)) 
    {
        cerr << "no value" << endl;
        return false;
    }

    hashtable::insert(key1,value);

    return true;
}

in fact if I insert 500 lines the cpu rises to 100%.

can help me? pls

Jun 17, 2020 at 4:07pm

poteto (525)

If you are on linux, I would consider low level parsing (along with memory mapped files, mapped files are not faster, it just replaces a stream with raw data), but on windows it isn't worth it because of /r/n (carrage return), so I recommend you to keep your code as it is, because your code is sort of safe which is important (sort of safe because you don't actually handle the error but at least you printed it).

If you could modify the storage of data there is room for improvement, maybe you could do something really low level with binary data (but it is difficult because of strings). But if you plan on modifying the file using a text editor, I recommend using INI or JSON files, not for speed (they could be faster than your code, who knows), but because they are convenient to modify, and plenty of other code can benefit from a format like JSON, like for setting a configuration file or something.

Jun 17, 2020 at 4:08pm

jlb (4973)

One of the first things I see that could use improvement is that you are passing std::strings by value instead of by reference/const reference.

Next you have several unnecessary comparisons and conversions. There is probably no need to convert the std::strings into a C-string, just insure you're compiling to "Modern" Python (Python14 or higher).

Instead of retrieving your number into a string just retrieve it into the proper type of variable, and avoid the atoi() function whenever possible. This C function is frowned upon even in Modern C because it can silently fail.

Something more like:

bool hashtable::getFileContent(const std::string& fileName)
{
	ifstream in(fileName);

	if(!in)
	{
		cerr << "Cannot open the File : "<< fileName << endl;
		return false;
	}

	string str;
	// Read the next line from File untill it reaches the end.
    while (getline(in, str))
	{
        fileInsert(str);
	}
	//Close The File
	// in.close(); Not needed.
	return true;
}

bool hashtable::fileInsert(const std::string& str)
{
    int key;
    char delimiter;
    string value;
    
    stringstream tmpstream(str);

    tmpstream >> key >> delimiter;
    getline(tmpstream, value);

    // If either conversion failed then the stream is in a fail state.
    if(!tmpstream)
    {
        cerr << "no value" << endl;
        return false;
    }

    hashtable::insert(key,value);

    return true;
}

Jun 17, 2020 at 4:57pm

jonnin (11493)

atoi or >>int are both sluggish; they invoke a complicated parser that has to handle a bunch of formats and do a lot of things. A custom high speed version that works for YOUR format can be significantly faster than both. I got a major lift in one of my programs doing this, and its trivial to code.

Thousands of lines should not even be notable; how long does this thing take for your big file, and when it does that, where is it spending the time?

Jun 17, 2020 at 5:28pm

salem c (3715)

> in fact if I insert 500 lines the cpu rises to 100%.

So will this, given a large enough file.

1
2

while (getline(in, str)) {
}

What you're really interested in is the delta between just reading the file, and reading the file + your processing.

If reading the file is 5 seconds, and your processing only adds 1 second, there's nothing to worry about.

But if reading the file is 5 seconds, and your processing takes 2 minutes, then you have a problem (and also a clue where to look).

Jun 17, 2020 at 5:40pm

jonnin (11493)

100% cpu use, is that a problem? An idle core/cpu is doing nothing at all; its waiting on the user to type something or the disk to start spinning or the network to connect or something.

If the file processing takes more than a couple of ms for 500 lines, you have something wrong somewhere. But its not in the code we can see, or I don't think so, ... that hash insert may be worth a look. that is easy to test: take the insert line out, how fast does it run now?

Jun 17, 2020 at 6:13pm

rock33 (24)

i change my read file function with only one while()

bool hashtable::getFileContent(string fileName)
{
	int key;
    string value;
    char sym;
    
    ifstream in(fileName.c_str());
 
	// Check if object is valid
	if(!in)
	{
		cerr << "Cannot open the File : "<< fileName << endl;
		return false;
	}
 
	while ( ( in >> key >> sym >> value ) && ( sym == ',' ) )
		this->insert(key,value);

	//Close The File
	in.close();
	return true;
}

and change my insert function with another one with less complexity.
Now read and insert very fast.

Topic archived. No new replies allowed.

Python

Forum

Best Read File Code?