The necessity for recommendation models that can capture both semantic information and device-mediated learner interactions has increased due to the rapid growth of IoT-aware e-learning environments.
Figure 1a illustrates that off-policy learning primarily involves two policies: the behavioral policy (b), also known as the sampling distribution, and the target policy (\(\pi\)), also known as the ...