OBJECTIVES: Electronic Health Records (EHRs)-based surveillance systems are being actively developed for detecting adverse drug reactions (ADRs), but this is being hindered by the difficulty of extracting data from unstructured records. This study performed the analysis of ADRs from nursing notes for drug safety surveillance using the temporal difference method in reinforcement learning (TD learning). METHODS: Nursing notes of 8,316 patients (4,158 ADR and 4,158 non-ADR cases) admitted to Ajou University Hospital were used for the ADR classification task. A TD(lambda) model was used to estimate state values for indicating the ADR risk. For the TD learning, each nursing phrase was encoded into one of seven states, and the state values estimated during training were employed for the subsequent testing phase. We applied logistic regression to the state values from the TD(lambda) model for the classification task. RESULTS: The overall accuracy of TD-based logistic regression of 0.63 was comparable to that of two machine-learning methods (0.64 for a naive Bayes classifier and 0.63 for a support vector machine), while it outperformed two deep learning-based methods (0.58 for a text convolutional neural network and 0.61 for a long short-term memory neural network). Most importantly, it was found that the TD-based method can estimate state values according to the context of nursing phrases. CONCLUSIONS: TD learning is a promising approach because it can exploit contextual, time-dependent aspects of the available data and provide an analysis of the severity of ADRs in a fully incremental manner.