Electrocardiography (ECG) wave morphology and timing provide critical diagnostic information for diagnosing arrhythmias and conduction abnormalities, allowing risk stratification for various cardiac diseases. However, accurate extraction of these features becomes challenging in the presence of superimposed waves from distinct cardiac chambers, a common occurrence during pathological rhythms. This work proposes a novel Surrogate-boosted Temporal Contrastive Representation Learning (S-TCRL) framework to address this challenge. S-TCRL leverages weak labels, readily obtainable from invasive catheter examinations, to extract latent representations of superimposed P waves. We reformulate the problem from object-wise to sample-wise incomplete information by employing surrogate labels. A 1D fully-convolutional feature pyramid network (FPN) extracts multi-scale features from ECG signals. These features are segmented into equal-sized temporal regions, whose labels are inferred from individual samples using a multiple-instance learning (MIL) paradigm. Non-sequential embeddings are generated to facilitate alignment-free cosine similarity estimation. A temperature-scaled cross-entropy loss function minimizes the distance between embeddings of similar regions (likely containing P waves) while maximizing the distance between dissimilar ones. The framework's efficacy is evaluated on a custom ECG dataset comprising 3265 short-term recordings from 708 individuals undergoing catheter ablation. S-TCRL achieves significant improvement in the downstream P wave segmentation task compared to two baseline MIL methods. The average recall and precision for both P wave boundaries reach 70.0% and 80.0%, respectively, exceeding the baselines' 63.5% and 67.5%. These results demonstrate the potential of S-TCRL for embedding representation of superimposed P waves and its potential generalizability to other tasks like arrhythmia classification.