You could use a profiler, or you could write a benchmark, but most reliable is if you time and compare different versions of your code in the actual situation where it's going to be used.
Yes. You will probably need to loop 10,000 times in order to get a meaningful measurment. The resolution of clock() is 1 tick which is relatively long.
Take a look at <chrono> for more precise measurement tools.
I would make sure that you didn't include the output (notoriously slow, and possibly buffered) within the timing.
I am not gonna lie, I did not understand this line.
I forget that the clock is a tik of one second, I will change my line to: cout << "Time elapsed = " << (clock()-startTime)/1000; to get milliseconds result, or the one suggested by @lastChance cout << "Time elapsed = " << endTime-startTime;
By "profiler" I mean something like Callgrind (a valgrind tool) + kchachegrind (to view the result). By running your program with a profiler attached it will afterwards be able to tell you which parts of the program where it spends most time so that you know what to optimize.
By "benchmark" I mean writing multiple test programs that does the same thing in different ways to see which one runs faster. Writing a good benchmark is hard. If you're not careful compiler optimizations might optimize away things that you want to measure and running code inside a tight loop might not necessarily give you a result that is representable for how it will perform when you run it only once in a while.
number of instructions is exceedingly misleading. On PC hardware at least, some instructions take much, much longer than others.
Also timing code can have memory access oddities, for example running the same loop 50 times, the second or third copy will be faster than the first 1-2 copies.
performance is kind of a dark art. You have the actual algorithm, whether its efficient or not, then you have implementation, where you minimized the local detail stuff, threading considerations where you may do more work to save wall clock time (eg sort 10 segments of an array and merge them back together may be faster than sorting the original as one block, though it does extra steps!), and so many other things.
so you can compare algorithms, and count loops or whatever, but the big picture is harder to get a handle on. Profilers are smarter, but they are also GIGO engines: asking it to profile bubble sort won't tell that its a bad algorithm, it just tells you it spent a lot of time in the function or in its loops.