The next concept to understand is the total time through a system for one request. Let's say you want a hamburger from McDonald's. You walk in the door and there are four people in line in front of you (just like normal). The person taking and filling orders takes care of one person in 2 minutes. So the person being waited on now will be done in 2 minutes, when the next person in line will be waited on, which takes 2 minutes. In all it will be 8 minutes before you give the friendly counter person your order, and 10 minutes total before you get your food.
If we cut the time it takes for the counter person to wait on someone from 2 minutes to 1, everything moves faster. Each person takes 1 minute, so you have your order in 5. But since each person is in line a shorter period of time, the probability that there will be 4 people in line in front of you decreases as well. So improving processing time of a resource that has a queue waiting for it provides a potentially exponential improvement in total time through the system. This is why we address bottlenecked resources first.
The reason we say "potentially exponential" improvement is that the actual improvement depends on a number of factors, which are discussed further. Basically, the process time is most likely not constant, but depends on many factors, including availability of other system resources on which it could be blocked waiting for a response. In addition, the arrival rate of requests is probably not constant, but instead spread on some type of distribution. So if you happen to arrive during a burst of customers, like at lunch hour, when the kitchen is backed up getting hamburgers out for the drive through, you could still see a 2 minute processing time and 4 people in line in front of you. Actually determining the probabilities of these events is an exercise left to the reader, as is hitting yourself in the head with a brick. The idea here is just to introduce you to the concepts so you understand why your system performs as it does.
Another concept in high performance computing is parallel computing, used for applications like Beowulf. If your application is written correctly, this is like walking into McDonalds and seeing 4 people in line, but there are 4 lines open. So you can walk right up to a counter and get your burger in about two minutes. The down side of this is that the manager of the store has to pay for more people working in the store, or you have to buy more machines. There are also potential bottlenecks, such as only one person in the back making fries, or the person in line in front of you is still trying to decide what they want.