Thomas M Kroeger scite author profile

Thomas M Kroeger

5Publications

81Citation Statements Received

55Citation Statements Given

How they've been cited

How they cite others

Affiliations

Sandia National Laboratories, Sandia National Laboratories California, University of California, Santa Cruz

Publications

Order By: Most citations

The case for efficient file access pattern modeling

Kroeger

Long

View full text Add to dashboard Cite

Most modern I/O systems treat each file access independently. However, events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information.Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72% of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10% on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the lastsuccessor model. While this work was motivated by the use of file relationships for I/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.

show abstract

Fourier-assisted machine learning of hard disk drive access time models

Crume

Maltzahn

Ward

et al. 2013

View full text Add to dashboard Cite

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. Others have created behavioral models of hard disk drive performance, but none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We show how hard disk drive access times can be predicted to within 0.83 ms using a neural net after these frequencies are found using Fourier analysis.

show abstract

RESAR: Reliable Storage at Exabyte Scale

Schwarz

Amer

Kroeger

et al. 2016

View full text Add to dashboard Cite

Large-scale disk-based storage systems need to protect the data stored in them against individual disk failures, common component failures and latent disk errors. We present RESAR, a layout scheme that provides two failure tolerance by only using XOR operations to calculate parity data. Our layout has the same write overhead as that of a disk array whose layout is based on virtual RAID Level 6 disk arrays. If the size of a reliability stripe is k, our write overhead is 2/k. We show that RESAR is actually more resilient than the RAID Level 6 layout.

show abstract

Advanced Data Structures for Improved Cyber Resilience and Awareness in Untrusted Environments: LDRD Report

Bender

Berry

Farach-Colton

et al. 2018

View full text Add to dashboard Cite

Virtually the Same: Comparing Physical and Virtual Testbeds

Crussell

Kroeger

Brown

et al. 2019

View full text Add to dashboard Cite

Network designers, planners, and security professionals increasingly rely on large-scale testbeds based on virtualization to emulate networks and make decisions about real-world deployments. However, there has been limited research on how well these virtual testbeds match their physical counterparts. Specifically, does the virtualization that these testbeds depend on actually capture real-world behaviors sufficiently well to support decisions?As a first step, we perform simple experiments on both physical and virtual testbeds to begin to understand where and how the testbeds differ. We set up a web service on one host and run ApacheBench against this service from a different host, instrumenting each system during these tests.We define an initial repeatable methodology (algorithm) to quantitatively compare physical and virtual testbeds. Specifically we compare the testbeds at three levels of abstraction: application, operating system (OS) and network. For the application level, we use the ApacheBench results. For OS behavior, we compare patterns of system call orderings using Markov chains. This provides a unique visual representation of the workload and OS behavior in our testbeds. We also drill down into read-system-call behaviors and show how at one level both systems are deterministic and identical, but as we move up in abstractions that consistency declines. Finally, we use packet captures to compare network behaviors and performance. We reconstruct flows and compare per-flow and per-experiment statistics.From these comparisons, we find that the behavior of the workload in the testbeds is similar but that the underlying processes to support it do vary. The low-level network behavior can vary quite widely in packetization depending on the virtual network driver. While these differences can be important, and knowing about them will help experiment designers, the core application and OS behaviors still represent similar processes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thomas M Kroeger

The case for efficient file access pattern modeling

Fourier-assisted machine learning of hard disk drive access time models

RESAR: Reliable Storage at Exabyte Scale

Advanced Data Structures for Improved Cyber Resilience and Awareness in Untrusted Environments: LDRD Report

Virtually the Same: Comparing Physical and Virtual Testbeds

Contact Info

Product

Resources

About