The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural Language Processing (NLP) tasks has fostered the debate towards understanding whether NLMs implicitly learn linguistic competence. Probes, i.e. supervised models trained using NLM representations to predict linguistic properties, are frequently adopted to investigate this issue.However, it is still questioned if probing classification tasks really enable such investigation or if they simply hint at surface patterns in the data. This work contributes to such debate by presenting an approach to assess the effectiveness of a suite of probing tasks aimed at testing the linguistic knowledge implicitly encoded by one of the most prominent NLMs, BERT. To this aim, we compared the performance of probes when predicting gold and automatically altered values of a set of linguistic features. Our experiments, performed on Italian, extend the work of Miaschi et al.[1] evaluating the results across BERT layers and for sentences with different lengths. As a general result, we observed higher performance in the prediction of gold values, thus suggesting that the probing model is sensitive to the distortion of feature values. However, our experiments also showed that the length of the sentence is a highly influential factor that is able to confound the probing model's predictions.