I wrote a program to analyse a file. so basically it opens FileA and finds the wanted data with a While loop till EOF, writes it to FileB... Now i have a problem finding duplicates in that single file as i know one file can only have a fread pointer. Is there other ways to find the duplicates. If anyone is free to help me solve, i can send the whole output file.
#include <cstring>
#include <cstdio>
#define LINE_LEN 8
bool read8(FILE* f, char * c)
{
int i;
for (i=0;i<LINE_LEN;i++)
{
do
{
c[i] = fgetc(f);
if (c[i]==EOF)
returnfalse;
} while (c[i]=='\n'||c[i]=='\r');
}
c[LINE_LEN] = '\0';
returntrue;
}
bool duplicate(FILE* f, char * c)
{
staticchar dup[LINE_LEN+1];
while (read8(f, dup))
{
if (strcmp(c, dup)==0)
returntrue;
}
rewind(f);
returnfalse;
}
int main()
{
FILE * pFileIn, *pFileOut;
char buff[LINE_LEN+1];
pFileIn = fopen("in.txt", "r");
pFileOut = fopen("out.txt", "w+");
if (pFileIn==NULL) perror ("Error opening file");
else
{
while (read8(pFileIn, buff))
{
if (! duplicate(pFileOut, buff))
{
fseek(pFileOut, 0, SEEK_END);
fprintf(pFileOut, "%s\n", buff);
rewind(pFileOut);
}
}
fclose(pFileIn);
fclose(pFileOut);
}
return 0;
}
I think this way is the most memory efficient, but you should be able to find a faster way.
I would suggest doing this in c++, your code would probably be waaay shorter.
Your best bet is to read the file into memory, then work on it in memory.
Create a vector<string> vFileList; and load each line into that with vFileList.push_back(sCurrentLine);. Once you've loaded the whole file then you can work on it from the vector with no problems.
Edit: If you just wanna find duplicate lines the fastest way possible. Use a Map. As you load each line increment the value of it in the map by 1. map<string, int> mLinesCounter; :)