Integration of knowledge on the sequencestructure correlation of proteins provides a basis for the structural design of artificial novel proteins. As one of strategies, it is effective to consider a short segment, whose size is in between an amino acid and a domain, as a correlation unit for exploring the structure-to-sequence relationship. Here we report the development of a database called ProSeg, which consists of two sub-databases, Segment DB and Cluster DB. Segment DB contains tens of thousands of segments that were prepared by dividing the primary sequences of 370 proteins using a sliding L-residue window (L = 5, 9, 11, 15). These segments were classified into several thousands of clusters according to their threedimensional structural resemblance. Cluster DB contains much cluster-related information, which includes image, rank, frequency, secondary structure assignment, sequence profile, etc. Users can search for a suitable cluster by inputting an appropriate parameter (i.e., PDB ID, dihedral angles, or DSSP symbols), which identifies the backbone structure of a query segment. Analogous to a language, ProSeg could be regarded as a 'structure-sequence dictionary' that contains over 10,000 'protein words'. ProSeg is freely accessible through the Internet (http://riodb.ibase. aist.go.jp/proseg/).