A Study on the Use of Checksums for Integrity Verification of Web Downloads

Meylan, Alexandre; Cherubini, Mauro; Chapuis, Bertil; Humbert, Mathias; Bilogrevic, Igor; Huguenin, Kévin

doi:10.1145/3410154

Cited by 3 publications

(3 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, through a user survey (𝑁 = 2000), they show that most internet users do not understand the purpose of checksums found on download pages. In a follow-up study, Meylan et al [13] studied, through an in-the-wild experiment (𝑁 = 134), the exposure of internet users to checksums and their reactions.…”

Section: Related Workmentioning

confidence: 99%

“…To enable researchers and practitioners to reproduce our work and to benefit from its result, we make our dataset and the code for collecting and analyzing it available on OSF. 13 More specifically, we provide (1) the pre-trained model-in the PMML format-of our classifier (for identifying webpages w/ checksums), together with a minimal example of how to use it in Python + sklearn, (2) our enriched dataset of webpages with checksums, together with the annotations (csv and sqlite), (3) the code of our crawler, and (4) the full transcript of our questionnaire.…”

Section: Dissemination Of the Data And Codementioning

confidence: 99%

See 1 more Smart Citation

An Empirical Study of the Usage of Checksums for Web Downloads

Bernard

Coudert

Chapuis

et al. 2023

Proceedings of the ACM Web Conference 2023

Self Cite

View full text Add to dashboard Cite

Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we elucidate the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200,000 most popular domains of the Web, we first crawled a dataset of 8.5M webpages, from which we built, through an activelearning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used, and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files. Finally, we complement our analysis with a survey of the webmasters of the considered webpages (𝑁 = 26), thus shedding light on the reasons behind the checksum-related choices they make. CCS CONCEPTS• Security and privacy → Web protocol security; Hash functions and message authentication codes.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Dissemination Of the Data And Codementioning

confidence: 99%

An Empirical Study of the Usage of Checksums for Web Downloads

Bernard

Coudert

Chapuis

et al. 2023

Proceedings of the ACM Web Conference 2023

Self Cite

View full text Add to dashboard Cite

show abstract

“…Checksumming is the most basic implementation of end-to-end data integrity [14]. Both sink and source endpoints retrieve the data from storage after a successful data transfer and then calculate the checksum with a hash technique including SHA1 [15] or MD5 [16].…”

Section: Introductionmentioning

confidence: 99%

TPBF: Two-Phase Bloom-Filter-Based End-to-End Data Integrity Verification Framework for Object-Based Big Data Transfer Systems

2022

View full text Add to dashboard Cite

Computational science simulations produce huge volumes of data for scientific research organizations. Often, this data is shared by data centers distributed geographically for storage and analysis. Data corruption in the end-to-end route of data transmission is one of the major challenges in distributing the data geographically. End-to-end integrity verification is therefore critical for transmitting such data across data centers effectively. Although several data integrity techniques currently exist, most have a significant negative influence on the data transmission rate as well as the storage overhead. Therefore, existing data integrity techniques are not viable solutions in high performance computing environments where it is very common to transfer huge volumes of data across data centers. In this study, we propose a two-phase Bloom-filter-based end-to-end data integrity verification framework for object-based big data transfer systems. The proposed solution effectively handles data integrity errors by reducing the memory and storage overhead and minimizing the impact on the overall data transmission rate. We investigated the memory, storage, and data transfer rate overheads of the proposed data integrity verification framework on the overall data transfer performance. The experimental findings showed that the suggested framework had 5% and 10% overhead on the total data transmission rate and on the total memory usage, respectively. However, we observed significant savings in terms of storage requirements, when compared with state-of-the-art solutions.

show abstract

Enhancing Data Security through Hybrid Error Detection: Combining Cyclic Redundancy Check (CRC) and Checksum Techniques

Hadi Saleh,

Mohammed

2024

IJEER

View full text Add to dashboard Cite

Error detection is a critical aspect of ensuring the accuracy of data transmission in communication systems. In this study, the performance of two error detection techniques has been investigated when combined to achieve a Bit Error Rate of 10^(-5)for single and multiple error detection ability. The two techniques studied were Cyclic Redundancy Check and Checksum with a new combination process. This proposed method showed that when CRC and Checksum were combined, the overall error detection performance significantly improved compared to using either technique alone. Specifically, the combined technique was able to achieve a BER of 10^(-5) for 6 given examples with higher accuracy and lower false positive rates. These findings demonstrate the potential benefits of combining error detection techniques to enhance the reliability of data transmission systems. These combinations were demonstrate using both VHDL and Python to identify the unexpected behavior of system before its utilization. The combination process provides 72 bits only for memory usage with 1 millisecond to finish checking and detecting process. These steps calculations and waveform are simulated using python for verification process based on overall combination steps. In addition, this paper provided a novel method for polynomial generation depending on the IP addresses of trusted sites. This evaluation of CRC generator was unique and provide double steps of protection for users in same or different networks.

show abstract

A Study on the Use of Checksums for Integrity Verification of Web Downloads

Cited by 3 publications

References 34 publications

An Empirical Study of the Usage of Checksums for Web Downloads

An Empirical Study of the Usage of Checksums for Web Downloads

TPBF: Two-Phase Bloom-Filter-Based End-to-End Data Integrity Verification Framework for Object-Based Big Data Transfer Systems

Enhancing Data Security through Hybrid Error Detection: Combining Cyclic Redundancy Check (CRC) and Checksum Techniques

Contact Info

Product

Resources

About