14Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and 15 antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of 16 the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 novel germline IGHV 17 alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel 18 alleles were selected for validation, out of which ten were successfully confirmed by targeted 19 amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of 20 variability upstream of the V-region in the 5'UTR, leader 1, and leader 2 sequences, and found that These loci remain incompletely characterized due to the fact that they contain many repetitive 35 sequence segments with many duplicated genes 4 , which makes it difficult to correctly assemble short 36 reads from whole genome sequencing. Single nucleotide polymorphisms as well as copy number 37 variations are in linkage disequilibrium and make up distinct haplotypes 4 . To this date, a limited 38 number of genomically sequenced 5-7 and inferred 8,9 haplotypes of the heavy chain and the two light 39 chain loci have been described. Different databases exist for genomic immune receptor DNA 40 sequences (IMGT/GENE-DB 10 ), putative novel variants from inferred data (IgPdb 11 ) or entire immune 41 receptor repertoires (OGRDB 12 ).
42The usage of immunoglobulin heavy chain variable (IGHV) genes and their mutational status are most 43 frequently studied in relation to cancer 13,14 , responses to vaccines 15,16 , or in autoimmune diseases [17][18][19] .
44Most IGHV genes have several allelic variants and more alleles are being discovered as a result of 45 adaptive immune receptor repertoire-sequencing (AIRR-seq) 20,21 . Software tools such as TIgGER 22,23 ,
46IgDiscover 24 and partis 25 allow to infer germline alleles from such repertoire data. Based on these 47 inferred alleles, the data can then be input to other tools that infer haplotypes and repertoire 48 deletions 26 . Incorrect annotation could possibly lead to inferring wrong deletions and biased 49 assessments. Therefore, having a full overview of germline variants is essential for studying the 50 adaptive immune response with high accuracy.51 Some allelic variants have been associated with increased disease susceptibility 27,28 , yet the impact of 52 immunoglobulin gene variation on disease risks is still unknown 29 . These regions have not been 53 sufficiently covered in the numerous genome wide association studies performed to date. More 54 comprehensive maps of polymorphisms are required for proper analysis.
55Here, we have used previously generated AIRR-seq data 30 from naïve B-cells of 98 Norwegian 56 individuals to identify novel IGHV alleles, a selection of which we then validated from genomic DNA 57 (gDNA) of non-B cells, i.e. T cells and monocytes. We also analyzed the sequences upstream of the 58 V-region, and constructed consensus sequences for the upstream variant...