We introduce and describe the Patent Similarity Data set, comprising vector space model-based similarity scores for United States utility patents. The data set provides approximately 640 million pre-calculated similarity scores, as well as the code and computed vectors required to calculate...