Composable Multi-processors employ large instruction windows and distributed layout, both of which amplify the branch misprediction penalty. Once branch misprediction is detected, hundreds or thousands of instructions may be inflight. Simple squashing all the instructions following the mispredicted branch turn to be a large waste. Branch misprediction becomes the key bottleneck in these systems.In this paper, we introduce Distributed Control Independence (DCI) to reduce branch misprediction bottleneck in a composable multi-processor, named TFlex. With control independence, branch misprediction penalty can be alleviated by saving the useful work of future control independent instructions. We found that only a small part of the saving instructions, whose data is depended on control dependent instructions, need reexecuting. DCI achieves high hardware efficiency and performance scalability. Our experiment results show that DCI effectively mitigates the bottleneck of branch misprediction and speeds up baseline TFlex by a geometric mean of 35% when running diverse applications on 16-core TFlex configuration.