2021
DOI: 10.1609/aiide.v11i1.12785
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Evaluation of Evaluation Metrics of Procedurally Generated Mario Levels

Abstract: There are several approaches in the literature for automatically generating Infinite Mario Bros levels. The evaluation of such approaches is often performed solely with computational metrics such as leniency and linearity. While these metrics are important for an initial exploratory evaluation of the content generated, it is not clear whether they are able to capture the player's perception of the content generated. In this paper we evaluate several of the commonly used computational metrics. Namely, we perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 27 publications
1
4
0
Order By: Relevance
“…In future work we would like to explore whether metrics can be found which are highly performant across multiple game domains of similar genre. This could synergise well with papers discussed previously which explored the relationship between metrics and player perceptions [11,17,23]. Metric pairs that scored highly in multiple domains using our criteria, while also correlating with human perceptions of quality or diversity would have the potential to be robustly useful in novel PCG domains without the need for evaluation.…”
Section: Limitations and Future Worksupporting
confidence: 66%
See 1 more Smart Citation
“…In future work we would like to explore whether metrics can be found which are highly performant across multiple game domains of similar genre. This could synergise well with papers discussed previously which explored the relationship between metrics and player perceptions [11,17,23]. Metric pairs that scored highly in multiple domains using our criteria, while also correlating with human perceptions of quality or diversity would have the potential to be robustly useful in novel PCG domains without the need for evaluation.…”
Section: Limitations and Future Worksupporting
confidence: 66%
“…Summerville et al explored the power of common metrics when predicting player experience and found strong correlations [23]. Marino et al conducted a similar study and found much more limited correlation [17]. Herve & Salge explored the relationship between common metrics and expert evaluations of Minecraft maps [11].…”
Section: Metric Selection For Eramentioning
confidence: 99%
“…Within the field of PCG, a truly qualitative evaluation of these artifacts is still challenging. Several experiments have already been conducted with the intent to critically examine some of the commonly used metrics and their relevance [10,8]. Although part of the tool set is pertinent in evaluation scenarios, it is also clear that we are missing player-driven evaluation methods.…”
Section: Possibility Spacementioning
confidence: 99%
“…For the VGLC representation, the output of the LSTM is connected to a dense layer with Softmax activation indicating the probability of a tile character. We perform an expressive range analysis of generated levels with the metrics: Linearity and Leniency (Summerville 2018;Smith et al 2010;Mariño, Reis, and Lelis 2015).…”
Section: Lstm For Level Generation-annotated Gamementioning
confidence: 99%