Olga Chuchuk scite author profile

Olga Chuchuk

2Publications

11Citation Statements Received

56Citation Statements Given

How they've been cited

How they cite others

Affiliations

Observatoire de la Côte d’Azur, European Organization for Nuclear Research, Université Côte d'Azur

Publications

Order By: Most citations

Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC

et al. 2021

View full text Add to dashboard Cite

The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an important role in resolving data management issues: deleting, replicating, moving between tapes, disks and caches. These data management procedures were still carried out in a semi-manual mode and now we have focused our efforts on automating it, making use of the historical knowledge about existing data management strategies. In this study we describe sources of information about data popularity and demonstrate their consistency. Based on the calculated popularity measurements, various distributions were obtained. Auxiliary information about replication and task processing allowed us to evaluate the correspondence between the number of tasks with popular data executed per site and the number of replicas per site. We also examine the popularity of user analysis data that is much less predictable than in the central production and requires more indicators than just the number of accesses.

show abstract

Caching for dataset-based workloads with heterogeneous file sizes

Chuchuk¹,

Neglia²,

Schulz³

et al. 2022

View full text Add to dashboard Cite

Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe. Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones. In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Olga Chuchuk

Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC

Caching for dataset-based workloads with heterogeneous file sizes

Contact Info

Product

Resources

About