Purpose: This study proposed a three-dimensional (3D) multi-modal learning-based model for the automated prediction and classification of lymph node metastasis in patients with non-small cell lung cancer (NSCLC) using computed tomography (CT) images and clinical information. Methods: We utilized clinical information and CT image data from 4239 patients with NSCLC across multiple institutions. Four deep learning algorithm-based multi-modal models were constructed and evaluated for lymph node classification. To further enhance classification performance, a soft-voting ensemble technique was applied to integrate the outcomes of multiple multi-modal models. Results: A comparison of the classification performance revealed that the multi-modal model, which integrated CT images and clinical information, outperformed the single-modal models. Among the four multi-modal models, the Xception model demonstrated the highest classification performance, with an area under the curve (AUC) of 0.756 for the internal test dataset and 0.736 for the external validation dataset. The ensemble model (SEResNet50_DenseNet121_Xception) exhibited even better performance, with an AUC of 0.762 for the internal test dataset and 0.751 for the external validation dataset, surpassing the multi-modal model's performance. Conclusions: Integrating CT images and clinical information improved the performance of the lymph node metastasis prediction models in patients with NSCLC. The proposed 3D multi-modal lymph node prediction model can serve as an auxiliary tool for evaluating lymph node metastasis in patients with non-pretreated NSCLC, aiding in patient screening and treatment planning.