In this paper, we study the smallest non-zero eigenvalue of the sample covariance matrices $$\mathcal {S}(Y)=YY^*$$
S
(
Y
)
=
Y
Y
∗
, where $$Y=(y_{ij})$$
Y
=
(
y
ij
)
is an $$M\times N$$
M
×
N
matrix with iid mean 0 variance $$N^{-1}$$
N
-
1
entries. We consider the regime $$M=M(N)$$
M
=
M
(
N
)
and $$M/N\rightarrow c_\infty \in \mathbb {R}{\setminus } \{1\}$$
M
/
N
→
c
∞
∈
R
\
{
1
}
as $$N\rightarrow \infty $$
N
→
∞
. It is known that for the extreme eigenvalues of Wigner matrices and the largest eigenvalue of $$\mathcal {S}(Y)$$
S
(
Y
)
, a weak 4th moment condition is necessary and sufficient for the Tracy–Widom law (Ding and Yang in Ann Appl Probab 28(3):1679–1738, 2018. https://doi.org/10.1214/17-AAP1341; Lee and Yin in Duke Math J 163(1):117–173, 2014. https://doi.org/10.1215/00127094-2414767). In this paper, we show that the Tracy–Widom law is more robust for the smallest eigenvalue of $$\mathcal {S}(Y)$$
S
(
Y
)
, by discovering a phase transition induced by the fatness of the tail of $$y_{ij}$$
y
ij
’s. More specifically, we assume that $$y_{ij}$$
y
ij
is symmetrically distributed with tail probability $$\mathbb {P}(|\sqrt{N}y_{ij}|\ge x)\sim x^{-\alpha }$$
P
(
|
N
y
ij
|
≥
x
)
∼
x
-
α
when $$x\rightarrow \infty $$
x
→
∞
, for some $$\alpha \in (2,4)$$
α
∈
(
2
,
4
)
. We show the following conclusions: (1) When $$\alpha >\frac{8}{3}$$
α
>
8
3
, the smallest eigenvalue follows the Tracy–Widom law on scale $$N^{-\frac{2}{3}}$$
N
-
2
3
; (2) When $$2<\alpha <\frac{8}{3}$$
2
<
α
<
8
3
, the smallest eigenvalue follows the Gaussian law on scale $$N^{-\frac{\alpha }{4}}$$
N
-
α
4
; (3) When $$\alpha =\frac{8}{3}$$
α
=
8
3
, the distribution is given by an interpolation between Tracy–Widom and Gaussian; (4) In case $$\alpha \le \frac{10}{3}$$
α
≤
10
3
, in addition to the left edge of the MP law, a deterministic shift of order $$N^{1-\frac{\alpha }{2}}$$
N
1
-
α
2
shall be subtracted from the smallest eigenvalue, in both the Tracy–Widom law and the Gaussian law. Overall speaking, our proof strategy is inspired by Aggarwal et al. (J Eur Math Soc 23(11):3707–3800, 2021. https://doi.org/10.4171/jems/1089) which is originally done for the bulk regime of the Lévy Wigner matrices. In addition to various technical complications arising from the bulk-to-edge extension, two ingredients are needed for our derivation: an intermediate left edge local law based on a simple but effective matrix minor argument, and a mesoscopic CLT for the linear spectral statistic with asymptotic expansion for its expectation.