We study the learnability of symbolic finite state automata (SFA), a model
shown useful in many applications in software verification. The
state-of-the-art literature on this topic follows the query learning paradigm,
and so far all obtained results are positive. We provide a necessary condition
for efficient learnability of SFAs in this paradigm, from which we obtain the
first negative result. The main focus of our work lies in the learnability of
SFAs under the paradigm of identification in the limit using polynomial time
and data, and its strengthening efficient identifiability, which are concerned
with the existence of a systematic set of characteristic samples from which a
learner can correctly infer the target language. We provide a necessary
condition for identification of SFAs in the limit using polynomial time and
data, and a sufficient condition for efficient learnability of SFAs. From these
conditions we derive a positive and a negative result. The performance of a
learning algorithm is typically bounded as a function of the size of the
representation of the target language. Since SFAs, in general, do not have a
canonical form, and there are trade-offs between the complexity of the
predicates on the transitions and the number of transitions, we start by
defining size measures for SFAs. We revisit the complexity of procedures on
SFAs and analyze them according to these measures, paying attention to the
special forms of SFAs: normalized SFAs and neat SFAs, as well as to SFAs over a
monotonic effective Boolean algebra. This is an extended version of the paper
with the same title published in CSL'22.