InoueHiroshi scite author profile

InoueHiroshi

5Publications

17Citation Statements Received

54Citation Statements Given

How they've been cited

How they cite others

Affiliations

IBM Research - Tokyo, Osaka Research Institute of Industrial Science and Technology

Publications

Order By: Most citations

Fabrication of Polysilane–Silica Hybrid Thin Films with Controlled Refractive Index

MatsuuraYukihito

MatsukawaKimihiro

InoueHiroshi

2001

View full text Add to dashboard Cite

We have developed the photocrosslinked films of a diphenyl-or dinaphthylfluorene having epoxy and oxetane moieties and polysilanes blends in the presence of a photoacid generator by irradiation at 405 nm. Photo-induced decomposition of the Si-Si bonds of the polysilanes was successfully suppressed during the visible light irradiation. The cationic photocrosslinking properties of the blends were strongly affected by the post-exposure-bake conditions and irradiation dose. Polysilane moieties were incorporated into the film by the termination reaction of the polymerization with the terminal OH groups. We have successfully fabricated films with high refractive indices (nd: 1.70) and the refractive index values were tunable by irradiation at 254 nm due to the decomposition of the Si-Si bonds of the polysilanes.

show abstract

Identifying the sources of cache misses in Java programs without relying on hardware counters

InoueHiroshi

NakataniToshio

2012

SIGPLAN Not.

View full text Add to dashboard Cite

Cache miss stalls are one of the major sources of performance bottlenecks for multicore processors. A Hardware Performance Monitor (HPM) in the processor is useful for locating the cache misses, but is rarely used in the real world for various reasons. It would be better to find a simple approach to locate the sources of cache misses and apply runtime optimizations without relying on an HPM. This paper shows that pointer dereferencing in hot loops is a major source of cache misses in Java programs. Based on this observation, we devised a new approach to identify the instructions and objects that cause frequent cache misses. Our heuristic technique effectively identifies the majority of the cache misses in typical Java programs by matching the hot loops to simple idiomatic code patterns. On average, our technique selected only 2.8% of the load and store instructions generated by the JIT compiler and these instructions accounted for 47% of the L1D cache misses and 49% of the L2 cache misses caused by the JIT-compiled code. To prove the effectiveness of our technique in compiler optimizations, we prototyped object placement optimizations, which align objects in cache lines or collocate paired objects in the same cache line to reduce cache misses. For comparison, we also implemented the same optimizations based on the accurate information obtained from the HPM. Our results showed that our heuristic approach was as effective as the HPMbased approach and achieved comparable performance improvements in the SPECjbb2005 and SPECpower_ssj2008 benchmark programs.

show abstract

Reducing trace selection footprint for large-scale Java applications without performance loss

et al. 2011

View full text Add to dashboard Cite

When optimizing large-scale applications, striking the balance between steady-state performance, start-up time, and code size has always been a grand challenge. While recent advances in trace compilation have significantly improved the steady-state performance of trace JITs for large-scale Java applications, the size control aspect of a trace compilation system remains largely overlooked. For instance, using the DaCapo 9.12 benchmarks, we observe that 40% of traces selected by a state-of-the-art trace selection algorithm are short-lived and, on average, each selected basic block is replicated 13 times in the trace cache.This paper studies the size control problem for a class of commonly used trace selection algorithms and proposes six techniques to reduce the footprint of trace selection without incurring any performance loss. The crux of our approach is to target redundancies in trace selection in the form of either short-lived traces or unnecessary trace duplication.Using one of the best performing selection algorithms as the baseline, we demonstrate that, on the DaCapo 9.12 benchmarks and DayTrader 2.0 on WebSphere Application Server 7.0, our techniques reduce the code size and compilation time by 69% and the start-up time by 43% while retaining the steady-state performance. On DayTrader 2.0, an example of large-scale application, our techniques also improve the steady-state performance by 10%.

show abstract

A new idiom recognition framework for exploiting hardware-assist instructions

et al. 2006

View full text Add to dashboard Cite

Modern processors support hardware-assist instructions (such as TRT and TROT instructions on IBM zSeries) to accelerate certain functions such as delimiter search and character conversion. Such special instructions have often been used in high performance libraries, but they have not been exploited well in optimizing compilers except for some limited cases. We propose a new idiom recognition technique derived from a topological embedding algorithm [4] to detect idiom patterns in the input program more aggressively than in previous approaches. Our approach can detect a pattern even if the code segment does not exactly match the idiom. For example, we can detect a code segment that includes additional code within the idiom pattern. We implemented our new idiom recognition approach based on the Java Just-In-Time (JIT) compiler that is part of the J9 Java Virtual Machine, and we supported several important idioms for special hardware-assist instructions on the IBM zSeries and on some models of the IBM pSeries. To demonstrate the effectiveness of our technique, we performed two experiments. The first one is to see how many more patterns we can detect compared to the previous approach. The second one is to see how much performance improvement we can achieve over the previous approach. For the first experiment, we used the Java Compatibility Kit (JCK) API tests. For the second one we used IBM XML parser, SPECjvm98, and SPCjbb2000. In summary, relative to a baseline implementation using exact pattern matching, our algorithm converted 75% more loops in JCK tests. We also observed significant performance improvement of the XML parser by 64%, of SPECjvm98 by 1%, and of SPECjbb2000 by 2% on average on a z990. Finally, we observed the JIT compilation time increases by only 0.32% to 0.44%.

show abstract

A study of memory management for web-based applications on multicore processors

2009

View full text Add to dashboard Cite

More and more server workloads are becoming Web-based. In these Web-based workloads, most of the memory objects are used only during one transaction. We study the effect of the memory management approaches on the performance of such Web-based applications on two modern multicore processors. In particular, using six PHP applications, we compare a general-purpose allocator (the default allocator of the PHP runtime) and a region-based allocator, which can reduce the cost of memory management by not supporting per-object free. The region-based allocator achieves better performance for all workloads on one processor core due to its smaller memory management cost. However, when using eight cores, the region-based allocator suffers from hidden costs of increased bus traffics and the performance is reduced for many workloads by as much as 27.2% compared to the default allocator. This is because the memory bandwidth tends to become a bottleneck in systems with multicore processors. We propose a new memory management approach, defrag-dodging , to maximize the performance of the Web-based workloads on multicore processors. In our approach, we reduce the memory management cost by avoiding defragmentation overhead in the malloc and free functions during a transaction. We found that the transactions in Web-based applications are short enough to ignore heap fragmentation, and hence the costs of the defrag-mentation activities in existing general-purpose allocators outweigh their benefits. By comparing our approach against the region-based approach, we show that a per-object free capability can reduce bus traffic and achieve higher performance on multicore processors. We demonstrate that our defrag-dodging approach improves the performance of all the evaluated applications on both processors by up to 11.4% and 51.5% over the default allocator and the region-based allocator, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

InoueHiroshi

Fabrication of Polysilane–Silica Hybrid Thin Films with Controlled Refractive Index

Identifying the sources of cache misses in Java programs without relying on hardware counters

Reducing trace selection footprint for large-scale Java applications without performance loss

A new idiom recognition framework for exploiting hardware-assist instructions

A study of memory management for web-based applications on multicore processors

Contact Info

Product

Resources

About