Recent developments in the computational automated design of electromagnetic devices, otherwise known as inverse design, have significantly enhanced the design process for nanophotonic systems. Inverse design can both reduce design time considerably and lead to high-performance, nonintuitive structures that would otherwise have been impossible to develop manually. Despite the successes enjoyed by structure optimization techniques, most approaches leverage electromagnetic solvers that require significant computational resources and suffer from slow convergence and numerical dispersion. Recently, a fast simulation and boundary-based inverse design approach based on boundary integral equations was demonstrated for two-dimensional nanophotonic problems. In this work, we introduce a new full-wave three-dimensional simulation and boundary-based optimization framework for nanophotonic devices also based on boundary integral methods, which achieves high accuracy even at coarse mesh discretizations while only requiring modest computational resources. The approach has been further accelerated by leveraging GPU computing, a sparse block-diagonal preconditioning strategy, and a matrix-free implementation of the discrete adjoint method. As a demonstration, we optimize three different devices: a 1:2 1550 nm power splitter and two nonadiabatic mode-preserving waveguide tapers. To the best of our knowledge, the tapers, which span 40 wavelengths in the silicon material, are the largest silicon photonic waveguiding devices to have been optimized using full-wave 3D solution of Maxwell's equations.