Purpose: Several types of structural heart intervention (SHI) use information from multiple imaging modalities to complete an interventional task. For example, in transcatheter aortic valve replacement (TAVR), placement and deployment of a bioprosthetic aortic valve in the aorta is primarily guided by x-ray fluoroscopy (XRF), and echocardiography provides visualization of cardiac anatomy and blood flow. However, simultaneous interpretation of independent x-ray and echo displays remains a challenge for the interventionalist. The purpose of this work was to develop a novel echo/x-ray co-registration solution in which volumetric transthoracic echo (TTE) is transformed to the x-ray coordinate system by tracking the three-dimensional (3D) pose of a probe fiducial attachment from its appearance in two-dimensional (2D) x-ray images. Methods: A fiducial attachment for a commercial TTE probe consisting of rings of high-contrast ball bearings was designed and fabricated. The 3D pose (position and orientation) of the fiducial attachment is estimated from a 2D x-ray image using an algorithm in which a virtual point cloud model of the attachment is iteratively rotated, translated, and forward-projected onto the image until the average sum-of-squares of grayscale values at the projected points is minimized. Fiducial registration error (FRE) and target registration error (TRE) of this approach were evaluated in phantom studies using TAVR-relevant gantry orientations and four standard acoustic windows for the TTE probe. A patient study was conducted to assess the clinical suitability of the fiducial attachment prototype during TTE imaging of patients undergoing SHI. TTE image quality for the task of guiding a transcatheter procedure was evaluated in a reviewer study. Results: The 3D FRE ranged from 0.32 AE 0.03 mm (mean AE SD) to 1.31 AE 0.05 mm, depending on C-arm orientation and probe acoustic window. The 3D TRE ranged from 1.06 AE 0.03 mm to 2.42 AE 0.06 mm. Fiducial pose estimation was stable when >75% of the fiducial markers were visible in the x-ray image. A panel of reviewers graded the presentation of heart valves in TTE images from 48 SHI patients. While valve presentation did not differ significantly between acoustic windows (P > 0.05), the mitral valve did achieve a significantly higher image quality compared to the aortic and tricuspid valves (P < 0.001). Overall, reviewers perceived sufficient image quality in 76.5% of images of the mitral valve, 54.9% of images of the aortic valve, and 48.6% of images of the tricuspid valve. Conclusions: Fiducial-based tracking of a commercial TTE probe is compatible with clinical SHI workflows and yields 3D target registration error of less than 2.5 mm for a variety of x-ray gantry geometries and echo probe acoustic windows. Although TTE image quality with respect to target valve anatomy was sufficient for the majority of cases examined, prescreening of patients for sufficient TTE quality would be helpful.