Emotion recognition in conversations (ERC) typically requires modeling both intra- and inter-speaker context dependencies. However, when modeling inter-speaker dependencies, it may not capture differences among other participants in the conversation. Recent ERC research has attempted to improve utterance representations by utilizing speakers’ commonsense knowledge. Nonetheless, these studies ignore the causal consistency in knowledge between the two participants, which contradicts the above modeling of speaker-sensitive context dependencies. Additionally, it is observed that historical utterances from various topics are blindly leveraged in context modeling, which fails the inter- and intra-topic coherence. To address these issues, we propose the topic- and causal-aware interactive graph network (TCA-IGN). Specifically, we suggest a graph encoder to model topic-level context dependencies, achieving inter- and intra-topic coherence. The topics of utterances are derived from a context-sensitive neural topic model. Then, we present a causal-aware graph attention to keep the speaker’s causal consistency in commonsense knowledge, improving speaker-level context modeling. Finally, considering the defect of modeling inter-speaker or inter-topic context dependencies, we employ supervised contrastive learning to sweeten it. Experimental results show that TCA-IGN outperforms state-of-the-art methods on three public conversational datasets.