score:2

Accepted answer

The ε-greedy policy selects a random action with probability ε or the best known action with probability 1-ε. At ε=1 it will always pick the random action. This value makes the trade-off between exploration and exploitation: you want to use the knowledge you have, but you also want to search for better alternatives.