P2G: A Framework for Distributed Real-Time Processing of Multimedia Data

Espeland, Håvard; Beskow, Paul B.; Stensland, Håkon Kvale; Olsen, Pelle Valentin; Kristoffersen, Ståle B; Griwodz, Carsten; Halvorsen, Pål

doi:10.1109/icppw.2011.22

Cited by 12 publications

(7 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The implication of the very different behaviors observed on the different microarchitectures is a demand for scheduling that can adapt to varying conditions using instrumentation data collected at runtime. Our next step is therefore to design such a low-level scheduler in the context of our P2G processing framework [7] using our improvements for the work-stealing approach as a starting point [17].…”

Section: Discussionmentioning

confidence: 99%

“…However, these frameworks are limited by their design for batch processing with few dependencies across a large cluster of machines. We are therefore currently working on a framework aimed for distributed real-time multimedia processing called P2G [7]. In this work, we have identified several challenges with respect to low level scheduling.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures

Espeland

Olsen

Halvorsen

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

Self Cite

View full text Add to dashboard Cite

Abstract-Processing data intensive multimedia workloads is challenging, and scheduling and resource management are vitally important for the best possible utilization of machine resources. In earlier work, we have used work-stealing, which is frequently used today, and proposed improvements. We found already then that no singular work-stealing variant is ideally suited for all workloads. Therefore, we investigate in more detail in this paper how workloads consisting of various multimedia filter sequences should be scheduled on a variety of modern processor architectures to maximize performance. Our results show that a lowlevel scheduler additionally cannot achieve optimal performance without taking the specific micro-architecture, the placement of dependent tasks and cache sizes into account. These details are not generally available for application developers and they differ between deployments. Our proposal is therefore to use performance monitoring and dynamic adaption for the cyclic workloads of our target multimedia scenario, where operations are repeated cyclically on a stream of data.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures

Espeland

Olsen

Halvorsen

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

Self Cite

View full text Add to dashboard Cite

show abstract

“…18 Creates a pair of orientation matrices (X and Y) and stores results. 19 Fetches orientation matrices, scales, and orientations. Creates odd and even Gabor filters and stores results.…”

Section: Aliasing Vs Replicationmentioning

confidence: 99%

“…The system that we imagine draws from ideas presented in our P2G proof-of-concept [19], which exposed parallelization and scheduled minimal code blocks at runtime. This approach is only feasible for coarse parallelism, whereas the P2G framework can expose parallelism at the finest possible granularity.…”

Section: System Descriptionmentioning

confidence: 99%

A logical memory model for scaling parallel multimedia workloads

Olsen

Nyhus

Halvorsen

et al. 2015

Proceedings of the 25th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

Self Cite

View full text Add to dashboard Cite

The growing power of processors allows us to implement increasingly complex multimedia algorithms. However, this processor power is only available if the algorithms are implemented in a way that exploits the multi-core parallelism of these processors. Today, this requires that the skillsets required for algorithm development and for parallel programming are tightly combined to achieve this.By providing a language, compiler and runtime that allows algorithm developers to specify algorithms as a series of data-transforming kernels written in C++, while the parallelization opportunities are built into the compiler and runtime, we hope to alleviate this need for a dual skillset.In this paper, we focus on the performance improvements that our system can achieve by combining language design, compiler knowledge, and runtime decisions to overcome performance bottlenecks from fine-grained kernel scheduling and cache-line contention without adapting the algorithms they implement.

show abstract

“…Large processing frameworks like Google's MapReduce [1] and Microsoft's Dryad [3] are steps in the right direction, but they are targeted towards batch processing. As such, we present P2G [2], a framework designed to integrate concepts from modern batch processing frameworks into the world of real-time multimedia processing. We seek to scale transparently with the available resources (following the cloud computing paradigm) and to support heterogeneous computing resources, such as GPU processing cores.…”

Section: Introductionmentioning

confidence: 99%

Processing of multimedia data using the P2G framework

Beskow

Stensland

Espeland

et al. 2011

Proceedings of the 19th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

In this demo, we present the P2G framework designed for processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines. P2G is implemented to scale transparently with available resources, i.e., a concept familiar from the cloud computing paradigm. Additionally, P2G supports heterogeneous computing resources, such as x86 and GPU processing cores. We have implemented an interchangeable P2G kernel language which is meant to expose fundamental concepts of the P2G programming model and ease the application development. Here, we demonstrate the P2G execution node using a MJPEG encoder as an example workload when dynamically adding and removing processing cores.

show abstract

P2G: A Framework for Distributed Real-Time Processing of Multimedia Data

Cited by 12 publications

References 30 publications

Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures

Low-Level Scheduling Implications for Data-Intensive Cyclic Workloads on Modern Microarchitectures

A logical memory model for scaling parallel multimedia workloads

Processing of multimedia data using the P2G framework

Contact Info

Product

Resources

About