The model learns by taking a bit of text from the info (say, the opening sentence of the Wikipedia report) and trying to predict the next token while in the sequence. It then compares its output with the actual text within the education corpus and adjusts its parameters to correct any errors.
Eve